CN113705430B

CN113705430B - Form detection method, device, equipment and storage medium based on detection model

Info

Publication number: CN113705430B
Application number: CN202110989638.2A
Authority: CN
Inventors: 雷晨雨; 韩茂琨; 刘玉宇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-08-26
Filing date: 2021-08-26
Publication date: 2024-07-12
Anticipated expiration: 2041-08-26
Also published as: CN113705430A

Abstract

The application relates to artificial intelligence, in particular to target detection, and provides a table detection method, a device, computer equipment and a storage medium based on a detection model, wherein the method comprises the following steps: acquiring a document image; extracting a sub-network based on the feature map of the form detection model, and extracting a document feature map from the document image; based on a prediction sub-network of the form detection model, determining prediction information of a document feature map, wherein the prediction information at least comprises a first position of a form key point on the document feature map and a position offset of a projection position of the first position on a document image; determining a second position of a table key point on the document image according to the prediction information based on a preset position determination rule; a form region in the document image is determined based on the second locations of the plurality of form keypoints. The application also relates to blockchain techniques in which the resulting table regions may be stored. The document image can be an image of a document such as a medical record, a check list and the like.

Description

Form detection method, device, equipment and storage medium based on detection model

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a computer device, and a storage medium for detecting a table based on a detection model.

Background

The existing form detection method mainly comprises two modes of realizing based on computer vision and realizing based on semantic segmentation. The form detection method based on computer vision mainly adopts an anchor-based (anchor-based) target detection algorithm, such as yolo, faster-rcnn and other target detection algorithms. Whether the form detection is realized by an anchor-based target detection algorithm or the form detection is realized by semantic segmentation, ideal detection effects cannot be obtained for special forms such as long and narrow forms. For example, when the form detection is implemented by the anchor-based target detection algorithm, an anchor (anchor) is difficult to match a form when facing an elongated form, and the form cannot be detected, and an accurate detection result cannot be obtained when facing a form with a certain degree of inclination or distortion; when the form detection is realized based on semantic segmentation, the situation that many prospects are not easy to separate can occur when the form detection is faced to a long and narrow form, and the problem of long detection time can occur when the form detection is faced to a form which is inclined or distorted to a certain degree.

Disclosure of Invention

The application provides a table detection method, a device, computer equipment and a storage medium based on a detection model, which can realize table detection based on key point detection of a table.

In a first aspect, the present application provides a table detection method based on a detection model, which is characterized in that the method includes:

Acquiring a document image, wherein the size of the document image is a preset first size;

extracting a characteristic diagram of the document image based on a characteristic diagram extraction sub-network of a table detection model, wherein the size of the document characteristic diagram is a preset second size, and the first size is larger than the second size;

determining prediction information of the document feature map based on a prediction subnetwork of the form detection model, wherein the prediction information at least comprises a first position of a form key point on the document feature map and a position offset of a projection position of the first position on the document image;

determining a second position of the table key point on the document image according to the prediction information based on a preset position determination rule;

and determining a table area in the document image according to the second positions of the table key points.

In a second aspect, the present application provides a form detection device based on a detection model, which is characterized by comprising:

The feature map extracting module is used for extracting a sub-network based on the feature map of the form detection model, extracting a document feature map from the document image, wherein the size of the document feature map is a preset second size, and the first size is larger than the second size;

The prediction information acquisition module is used for determining the prediction information of the document feature map based on a prediction sub-network of the form detection model, wherein the prediction information at least comprises a first position of a form key point on the document feature map and a position offset of a projection position of the first position on the document image;

the key point determining module is used for determining a second position of the table key point on the document image according to the prediction information based on a preset position determining rule;

and the table determining module is used for determining a table area in the document image according to the second positions of the table key points.

In a third aspect, the present application provides a computer device comprising a memory and a processor; the memory is used for storing a computer program; the processor is configured to execute the computer program and implement the table detection method based on the detection model when the computer program is executed.

In a fourth aspect, the present application provides a computer readable storage medium storing a computer program, which, if executed by a processor, implements the table detection method based on the detection model.

The application discloses a form detection method, a form detection device, computer equipment and a storage medium based on a detection model, wherein the size of a document image is a preset first size; extracting a characteristic diagram of the document image based on a characteristic diagram extraction sub-network of a table detection model, wherein the size of the document characteristic diagram is a preset second size, and the first size is larger than the second size; determining prediction information of the document feature map based on a prediction subnetwork of the form detection model, wherein the prediction information at least comprises a first position of a form key point on the document feature map and a position offset of a projection position of the first position on the document image; determining a second position of the table key point on the document image according to the prediction information based on a preset position determination rule; according to the second positions of the table key points, determining a table area in the document image, completing table detection based on table key point detection, and effectively detecting special tables such as long and narrow tables, tables with inclination or distortion to a certain extent and the like; when the key point detection is realized, the position of the key point of the table on the document image is determined by combining the first position of the key point of the table on the document feature map and the position offset of the projection position of the first position on the document image, and the accuracy of the key point detection is improved, so that the accuracy of the table detection is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a table detection method based on a detection model according to an embodiment of the present application;

FIG. 2 is a schematic block diagram of a form detection model according to an embodiment of the present application;

FIG. 3 is a schematic block diagram of a table detection device based on a detection model according to an embodiment of the present application;

Fig. 4 is a schematic block diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations. In addition, although the division of the functional modules is performed in the apparatus schematic, in some cases, the division of the modules may be different from that in the apparatus schematic.

The embodiment of the application provides a table detection method, a table detection device, a table detection computer device and a table detection computer readable storage medium based on a detection model. The method is used for realizing the table detection based on the key point detection of the table, and improving the accuracy of the table detection. For example, in the table detection, some special tables are often encountered, for example, long and narrow tables, such as tables with 2 rows and 10 columns, or tables with certain inclination or distortion in document images for table detection are sometimes encountered, and a table area cannot be accurately detected; when the detection of the table key points is realized, the second position of the table key points on the document image is determined by combining the first position of the table key points on the document feature image and the position offset of the projection position of the first position on the document image, and the accuracy of the detection of the table key points is improved, so that the accuracy of the detection of the table is improved.

Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

Referring to fig. 1, fig. 1 is a schematic flow chart of a table detection method based on a detection model according to an embodiment of the application.

As shown in fig. 1, the table detection method based on the detection model may include the following steps S110 to S150.

Step S110, acquiring a document image, wherein the size of the document image is a preset first size.

The document image of the document to be detected is obtained by scanning or format conversion, for example. For example, a document to be detected is converted into a color picture of a first size by format conversion to acquire the document image. In one embodiment, the first size is height×width=512×512, each pixel of the document image corresponds to a color system (RGB), and the dimension of each pixel of the document image is 3.

Step S120, extracting a sub-network based on a feature map of a table detection model, and extracting a document feature map from the document image, wherein the size of the document feature map is a preset second size, and the first size is larger than the second size.

A feature map (feature map) comprises a plurality of features corresponding to different receptive fields, for example, for the document feature map, a receptive field may be understood as all pixels of a small area on the document image.

In some embodiments, the structure of the table detection model is shown in fig. 2, and the table detection model includes a feature map extraction sub-network for acquiring a feature map and a prediction sub-network for acquiring prediction information, where the feature map extraction sub-network includes a main sub-network and an up-sampling sub-network, and the prediction sub-network includes a plurality of branch sub-networks.

Illustratively, step S120 includes steps S121-S122.

S121, inputting the document image into a main sub-network of the feature map extraction sub-network to obtain a primary feature map;

Illustratively, the backbone subnetwork is a Convolutional Neural Network (CNN), and a convolutional kernel in the convolutional neural network may extract features of the document image to generate a feature map. For example, in one embodiment, the backbone subnetwork uses a convolutional neural network shufflenet V2, and the document image is input into the backbone subnetwork, so as to obtain a primary feature map feature_map1 with a size of height-width=16×16, where the dimension of each pixel in the feature_map1 is 64.

Shufflenet V2 is a lightweight convolutional neural network, which makes a good balance between speed and accuracy, and the detection model-based form detection method is realized based on shufflenet V and can be conveniently deployed in mobile terminal equipment.

S122, inputting the primary feature map into an up-sampling sub-network of the feature map extraction sub-network to obtain the document feature map.

Upsampling refers to a technique of giving an image a higher resolution, and by upsampling the primary feature map, the obtained document feature map has a higher resolution and a larger size than the primary feature map. The up-sampling sub-network may implement up-sampling by bilinear interpolation, transposed convolution, etc.

Illustratively, the height of the first dimension and the width of the first dimension are N times the height of the second dimension and the width of the second dimension in sequence, where N is a positive integer greater than or equal to 2. For example, in one embodiment, the upsampling sub-network of the feature_map1 input feature map extraction sub-network performs upsampling to obtain the feature_map2 of the document feature map with a second size of height×width=128×128, where the height of the first size and the width of the first size are 4 times the height of the second size and the width of the second size in sequence.

For example, the up-sampling sub-network may be adjusted according to the density of the tables in the document image, for example, when the detection of the dense tables may be faced, the network structure of the up-sampling sub-network may be adjusted to adjust the second size of the document feature map to a larger size, so as to obtain a more ideal detection effect for the dense tables.

And step S130, based on a prediction sub-network of the form detection model, determining prediction information of the document feature map, wherein the prediction information at least comprises a first position of a form key point on the document feature map and a position offset of a projection position of the first position on the document image.

Exemplary, the table center point and the table vertex are both table key points;

Illustratively, the predictive sub-network includes a plurality of branched sub-networks for acquiring different portions of the predictive information, and step S120 specifically includes steps S131-S134:

s131, inputting the document feature map into a first branch sub-network of the prediction sub-network to obtain a first position of a table center point on the document feature map;

For example, feature_map2 is input into a first branch sub-network of the prediction sub-network to obtain a first position c1 of a table center point on the document feature map in an output of the first branch sub-network; the first position, the second position and the projection position of all the key points can be expressed in terms of coordinates, for example, c1 is the coordinate (x 1, y 1).

In some embodiments, the prediction information further includes a table confidence, for example, the first branch subnetwork is further configured to obtain the table confidence, and the detection model-based table detection method further includes: and if the confidence coefficient of the form is smaller than a preset threshold value, outputting prompt information that the document does not contain the form.

S132, inputting the document feature map into a second branch sub-network of the prediction sub-network to obtain a first position of a table vertex on the document feature map;

For example, feature_map2 is input to the second branch sub-network of the prediction sub-network to obtain a first position q1 of an upper left table vertex, a first position q2 of a lower left table vertex, and a first position q4 of a lower right table vertex, respectively, the upper left table vertex being a table vertex located at an upper left side of the table, the upper right table vertex being a table vertex located at an upper right side of the table, the lower left table vertex being a table vertex located at a lower left side of the table, and the lower right table vertex being a table vertex located at a lower right side of the table.

S133, inputting the document feature map into a third branch sub-network of the prediction sub-network to acquire the offset of the projection position corresponding to the form center point on the document image;

For example, feature_map2 is input to the third branch sub-network of the prediction sub-network, so as to obtain, in the output of the third branch sub-network, an offset delta_c1 of the projection position corresponding to the table center point on the document image.

S134, inputting the document feature map into a fourth branch sub-network of the prediction sub-network to obtain the offset of the projection position corresponding to the table vertex on the document image.

For example, feature_map2 is input to a fourth branch subnetwork of the prediction subnetwork to obtain an offset delta_q1 of a projection position corresponding to an upper left table vertex, an offset delta_q2 of a projection position corresponding to an upper right table vertex, an offset delta_q3 of a projection position corresponding to a lower left table vertex, and an offset delta_q4 of a projection position corresponding to a lower right table vertex, respectively.

In some embodiments, the prediction information further includes a displacement between a table vertex and a table center point on the document image, and step S130 further includes step S135.

S135, inputting the document feature map into a fifth branch sub-network of the prediction sub-network to obtain displacement between the table top point and the table center point on the document image.

For example, feature_map2 is input to a fifth branch sub-network of the prediction sub-network to obtain, in an output of the fifth branch network, a displacement delta_p1 from a table center point to an upper left table vertex, a displacement delta_p2 from a table center point to an upper right table vertex, a displacement delta_p3 from a table center point to a lower left table vertex, and a displacement delta_p4 from a table center point to an upper right table vertex on the document image.

In some embodiments, the prediction information further includes a width of the table and a height of the table on the document image, and step S130 further includes step S136.

S136, inputting the document feature map into a sixth branch sub-network of the prediction sub-network to acquire the width of the table and the height of the table on the document image.

For example, feature_map2 is input to the sixth branch subnetwork of the predictive subnetwork to obtain the height h of the form and the width w of the form on the document image in the output of the sixth branch subnetwork.

Step S140, based on a preset position determining rule, determining a second position of the table key point on the document image according to the prediction information.

Illustratively, step S140 includes steps S141-S142:

s141, converting the first position of the key point of the table according to the ratio of the first size to the second size to obtain the projection position of the first position on the document image;

The step S141 specifically includes: and converting the first positions of all the form key points according to the ratio of the first size to the second size to obtain the projection positions of all the first positions on the document image.

For example, in one embodiment, since the height of the first dimension and the width of the first dimension are 4 times the height of the second dimension and the width of the second dimension in order, c1 is multiplied by 4 in both the height direction and the width direction to obtain the projection position c1_t corresponding to the center point of the table, q1 is multiplied by 4 in both the height direction and the width direction to obtain the projection position q1_t corresponding to the top left table vertex, q2 is multiplied by 4 in both the height direction and the width direction to obtain the projection position q2_t corresponding to the top right table vertex, q3 is multiplied by 4 in both the height direction and the width direction to obtain the projection position q3_t corresponding to the top right table vertex, and q4 is multiplied by 4 in both the height direction and the width direction to obtain the projection position q4_t corresponding to the top right table vertex.

S142, determining a second position of the table key point on the document image according to the projection position corresponding to the table key point and the position offset of the projection position.

In some embodiments, step S142 specifically includes: and determining a second position of each table key point on the document image according to the projection position corresponding to each table key point and the position offset of the projection position. For example, the second position of the table center point c1=c1_t+delta_c1, the second position of the upper left table vertex is q1_t+delta_q1, the second position of the upper right table vertex is q2_t+delta_q2, the second position of the lower left table vertex is q3_t+delta_q3, and the second position of the lower right table vertex is q4_t+delta_q4.

In still other embodiments, the prediction information further includes a displacement between a table vertex and a table center point on the document image, and step S142 specifically includes: determining a second position of a form center point on the document image according to a projection position corresponding to the form center point on the document feature map and the position offset of the projection position; step S140 also includes steps S143-S145.

S143, determining a first candidate position of the table vertex on the document image according to the projection position corresponding to the table vertex on the document feature map and the position offset of the projection position;

illustratively, the projection positions corresponding to the table vertices on the document feature map and the position offsets of the projection positions are added to obtain a first candidate position of the table vertices on the document image.

For example, the first candidate position q1=q1_t+delta_q1 for the top-left table vertex, the first candidate position q2=q2_t+delta_q2 for the top-right table vertex, the first candidate position q3=q3_t+delta_q3 for the bottom-left table vertex, and the first candidate position q4=q4_t+delta_q4 for the bottom-right table vertex.

S144, obtaining a second candidate position of the table vertex on the document image according to the projection position of the table center point and the displacement between the table vertex and the table center point on the document image;

Illustratively, the displacement between the table vertex and the table center point on the document image is a displacement from the table center point to the table vertex, and the projected position of the table center point is added to the displacement between the table vertex and the table center point on the document image to obtain the second candidate position of the table vertex on the document image.

For example, the second candidate position p1=c1_t+delta_p1 for the top-left table vertex, the second candidate position p2=c1_t+delta_p2 for the top-right table vertex, the second candidate position p3=c1_t+delta_p3 for the bottom-left table vertex, and the second candidate position p4=c1_t+delta_p4 for the bottom-right table vertex.

S145, determining a second position of the table vertex according to the first candidate position and the second candidate position.

In some embodiments, the second location of the table vertex is determined from an average of the first candidate location and the second candidate location.

In still other embodiments, the predictive information further includes a width of the form and a height of the form on the document image, and step S145 specifically includes steps S145a-S145c.

S145a, determining a reference frame of the table on the document image according to the second position of the center point of the table, the width of the table on the document image and the height of the table;

For example, the second position C1 of the center point of the table, the width w of the table, and the height h of the table are sequentially used as the center point of a rectangle on the document image, the width of the rectangle, and the height of the rectangle, so as to determine a rectangular border on the document, and determine the rectangular border as the reference border box of the table.

S145b, if all the first candidate positions of the table vertices are within the reference frame, determining the first candidate positions of the table vertices as second positions of the table vertices;

For example, if Q1, Q2, Q3, and Q4 are all within the box, then the second position of the top left table vertex, the second position of the top right table vertex, the second position of the bottom left table vertex, and the second position of the bottom right table vertex are Q1, Q2, Q3, and Q4 in that order.

And S145c, if the first candidate position of the table vertex is out of the reference frame, determining that the second candidate position of the table vertex is the second position of the table vertex.

For example, if any of Q1, Q2, Q3, and Q4 is outside the box, the second position of the top-left table vertex, the second position of the top-right table vertex, the second position of the bottom-left table vertex, and the second position of the bottom-right table vertex are P1, P2, P3, and P4 in that order.

S150, determining a table area in the document image according to the second positions of the table key points.

Illustratively, the second positions of the four table vertices of one table are taken as the positions of the four vertices of one quadrangle on the document image to determine one quadrangle on the document image, and the area of the quadrangle is determined as the table area.

In some implementations, the table regions in the resulting document image can be stored in blockchain nodes. The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

In some embodiments, the detection model-based tabular detection method further comprises training the target detection model by steps S100-S108:

S100, acquiring a plurality of training samples, wherein at least a part of the training samples are document images containing tables, and the document images containing the tables have labeling information of key points of the tables;

illustratively, documents for training are collected, and referring to step S110, document images of the documents are acquired by scanning or format conversion to obtain the training samples.

S101, obtaining a standard second position of the table key point according to the marking information of the table key point;

Illustratively, the labeling information includes labeling information of a table center point and labeling information of all table vertices, and the standard second position of the table center point and the standard second position of the table vertices are obtained in step S101.

S102, converting the standard second position of the table key point according to the ratio of the second size to the first size to obtain the standard first position of the table key point;

illustratively, the ratio of the second size to the first size is multiplied by the second location of the criterion for each of the keypoints and rounded to obtain the first location of the criterion for each of the keypoints.

S103, converting the standard first position of the key point of the table according to the ratio of the first size to the second size so as to correspondingly obtain a standard projection position;

Illustratively, the ratio of the first size to the second size is multiplied by the standard first position of each of the keypoints to obtain the projection position of the standard corresponding to each of the keypoints.

S104, determining the position offset of the standard projection position according to the standard second position of the table key point and the standard projection position corresponding to the table key point;

Illustratively, the second position of the standard of the table key point is subtracted from the projection position of the standard corresponding to the table key point to obtain the position offset of the projection position of the standard.

S105, determining labels of all training samples, wherein the labels of the document image containing the table at least comprise the first position of the standard and the position correction amount of the projection position of the standard;

Illustratively, the data in the tag corresponds to the data in the prediction information.

S106, extracting a sub-network based on the feature map of the table detection model, and extracting a document feature map of a training sample;

Illustratively, the training samples are input into a feature map extraction sub-network of a form detection model to obtain a document feature map of the training samples in an output of the feature map extraction sub-network. For the specific step of step S106, reference may be made to the specific step of step S120.

S107, determining the prediction information of the training sample according to the document feature map of the training sample based on the prediction subnetwork of the table detection model;

Illustratively, the document feature map of the training sample is input into a prediction sub-network of the form detection model to obtain prediction information of the training sample in an output of the prediction sub-network. For the specific step of step S107, reference may be made to the specific step of step S130.

S108, according to the errors of the labels of the training samples and the prediction information of the training samples, adjusting the network parameters of the form detection model.

Illustratively, the network parameters of the tabular detection model are adjusted to reduce the error by back-propagating the error in the tabular detection model.

Illustratively, steps S106-S108 are performed iteratively to train the form detection model, and if the form detection model converges, the iteration is stopped to obtain the form detection model for performing steps S110-S150.

According to the table detection method based on the detection model, the document image is obtained, and the size of the document image is a preset first size; extracting a characteristic diagram of the document image based on a characteristic diagram extraction sub-network of a table detection model, wherein the size of the document characteristic diagram is a preset second size, and the first size is larger than the second size; determining prediction information of the document feature map based on a prediction subnetwork of the form detection model, wherein the prediction information at least comprises a first position of a form key point on the document feature map and a position offset of a projection position of the first position on the document image; determining a second position of the table key point on the document image according to the prediction information based on a preset position determination rule; according to the second positions of the table key points, determining a table area in the document image, completing table detection based on table key point detection, and effectively detecting special tables such as long and narrow tables, tables with inclination or distortion to a certain extent and the like; when the key point detection is realized, the position of the key point of the table on the document image is determined by combining the first position of the key point of the table on the document feature map and the position offset of the projection position of the first position on the document image, and the accuracy of the key point detection is improved, so that the accuracy of the table detection is improved.

The embodiment of the application can acquire and process related data based on artificial intelligence technology, for example, the predictive information of the document image is acquired through a form detection model. Wherein artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Object detection is a subdivision of computer vision, i.e. the object is found in the image, while the class of the object and the position of the object are determined. The target detection is widely applied to the fields of robot navigation, intelligent video monitoring, industrial detection, aerospace and the like, reduces the consumption of human capital through computer vision, and has important practical significance.

In the field of target detection, an anchor (anchor) can be understood as a sliding window taking an anchor point as a center point, wherein the anchor point is a reference point preset on an image, and the accurate position and the size of a target are determined by acquiring the anchor optimally matched with the target based on the target detection method of the anchor. At present, a form detection method based on computer vision is mainly used for detecting a form based on an anchor (anchor-based) target, and when a long form, a form with a certain inclination or distortion in a document image and other special forms appear, the anchor is difficult to match the form, so that the form cannot be accurately detected. The invention realizes the table detection based on the key point detection of the table, is a (anchor-free) target detection method without using anchors, and can obtain better detection effect for special tables such as long and narrow tables, tables with certain inclination or distortion in document images and the like.

The invention can also be applied in the medical field, for example, document images can be images of medical records, exams, etc.

As shown in fig. 3, the table detection device based on the detection model includes: an image acquisition module 110, a feature map extraction module 120, a prediction information acquisition module 130, a key point determination module 140, and a table determination module 150.

An image acquisition module 110, configured to acquire a document image, where a size of the document image is a preset first size;

a feature map extracting module 120, configured to extract a feature map of the document for the document image based on a feature map extracting sub-network of the table detection model, where a size of the feature map of the document is a preset second size, and the first size is greater than the second size;

A prediction information obtaining module 130, configured to determine, based on a prediction subnetwork of the table detection model, prediction information of the document feature map, where the prediction information includes at least a first position of a table key point on the document feature map and a position offset of a projection position of the first position on the document image;

a key point determining module 140, configured to determine, based on a preset location determining rule, a second location of the table key point on the document image according to the prediction information;

The form determining module 150 is configured to determine a form area in the document image according to the second positions of the plurality of form keypoints.

The feature map extraction module 120 illustratively includes a primary feature map extraction module and an upsampling module.

The primary feature map extraction module is used for inputting the document image into a main sub-network of the feature map extraction sub-network so as to obtain a primary feature map;

and the up-sampling module is used for inputting the primary feature map into an up-sampling sub-network of the feature map extraction sub-network so as to obtain the document feature map.

Illustratively, the prediction information acquisition module 130 is specifically configured to: inputting the document feature map into a first branch sub-network of the prediction sub-network to obtain a first position of a table center point on the document feature map; inputting the document feature map into a second branch sub-network of the prediction sub-network to obtain a first position of a table vertex on the document feature map; inputting the document feature map into a third branch sub-network of the prediction sub-network to obtain the offset of the projection position corresponding to the form center point on the document image; and inputting the document feature map into a fourth branch sub-network of the prediction sub-network to obtain the offset of the projection position corresponding to the table vertex on the document image.

In some embodiments, the keypoint determination module 140 is specifically configured to: converting the first position of the key point of the table according to the ratio of the first size to the second size to obtain the projection position of the first position on the document image; and determining a second position of the table key point on the document image according to the projection position corresponding to the table key point and the position offset of the projection position.

In some other embodiments, the prediction information further includes a displacement between a table vertex and a table center point on the document image, and the key point determining module 140 specifically includes a projection position determining unit, a table center point position determining unit, a table vertex first candidate position determining unit, a table vertex second candidate position determining unit, and a table vertex position determining unit.

A projection position determining unit configured to convert first positions of all the form keypoints according to a ratio of the first size to the second size to obtain projection positions of all the first positions on the document image;

a table center point position determining unit, configured to determine a second position of a table center point on the document image according to a projection position corresponding to the table center point on the document feature map and a position offset of the projection position;

A table vertex first candidate position determining unit, configured to determine a first candidate position of a table vertex on the document image according to a projection position corresponding to the table vertex on the document feature map and a position offset of the projection position;

A table vertex second candidate position determining unit, configured to obtain a second candidate position of a table vertex on the document image according to a projection position of the table center point and a displacement between the table vertex and the table center point on the document image;

And a table vertex position determining unit, configured to determine a second position of the table vertex according to the first candidate position and the second candidate position.

Illustratively, the prediction information further includes a width of the table and a height of the table on the document image, and the table vertex position determining unit is specifically configured to: determining a reference frame of the form on the document image according to the second position of the form center point, the width of the form on the document image and the height of the form; if all the first candidate positions of the table vertices are within the reference frame, determining the first candidate positions of the table vertices as second positions of the table vertices; and if the first candidate position of the table vertex is outside the reference frame, determining that the second candidate position of the table vertex is the second position of the table vertex.

Illustratively, the form detection apparatus further includes a form detection model training module.

The system comprises a form detection model training module, a form detection model analysis module and a form detection model analysis module, wherein the form detection model training module is used for acquiring a plurality of training samples, at least one part of the training samples are document images containing forms, and the document images containing the forms have marking information of key points of the forms; obtaining a standard second position of the table key point according to the marking information of the table key point; converting the second position of the standard of the table key point according to the ratio of the second size to the first size to obtain the first position of the standard of the table key point; converting the standard first position of the key point of the table according to the ratio of the first size to the second size to correspondingly obtain the standard projection position; determining the position offset of the standard projection position according to the standard second position of the table key point and the standard projection position corresponding to the table key point; determining labels of all training samples, wherein the labels of the document image containing the table at least comprise a first position of the standard and a position correction amount of a projection position of the standard; extracting a sub-network based on the feature map of the form detection model, and extracting a document feature map of a training sample; based on a prediction subnetwork of the form detection model, determining prediction information of the training sample according to a document feature map of the training sample; and adjusting network parameters of the form detection model according to errors of the labels of the training samples and the prediction information of the training samples.

Referring to fig. 4, fig. 4 is a schematic diagram of a computer device according to an embodiment of the application. The computer device may be a server or a terminal.

As shown in fig. 4, the computer device includes a processor, a memory, and a network interface connected by a system bus, wherein the memory may include a non-volatile storage medium and an internal memory.

The non-volatile storage medium may store an operating system and a computer program. The computer program comprises program instructions that, when executed, cause the processor to perform any of a number of detection model-based form detection methods.

The processor is used to provide computing and control capabilities to support the operation of the entire computer device.

The internal memory provides an environment for the execution of a computer program in a non-volatile storage medium that, when executed by a processor, causes the processor to perform any of a number of detection model-based form detection methods.

The network interface is used for network communication such as transmitting assigned tasks and the like. It will be appreciated by those skilled in the art that the architecture of the computer device, which is merely a block diagram of some of the structures associated with the present application, is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or less components than those shown, or may combine some of the components, or have a different arrangement of components.

It should be appreciated that the Processor may be a central processing unit (Central Processing Unit, CPU), it may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Wherein in some embodiments the processor is configured to run a computer program stored in the memory to implement the steps of: acquiring a document image, wherein the size of the document image is a preset first size; extracting a characteristic diagram of the document image based on a characteristic diagram extraction sub-network of a table detection model, wherein the size of the document characteristic diagram is a preset second size, and the first size is larger than the second size; determining prediction information of the document feature map based on a prediction subnetwork of the form detection model, wherein the prediction information at least comprises a first position of a form key point on the document feature map and a position offset of a projection position of the first position on the document image; determining a second position of the table key point on the document image according to the prediction information based on a preset position determination rule; and determining a table area in the document image according to the second positions of the table key points.

Illustratively, the processor is configured to implement a feature map extraction sub-network based on a table detection model, and when extracting a document feature map for the document image, implement: inputting the document image into a main sub-network of the feature map extraction sub-network to obtain a primary feature map; and inputting the primary feature map into an up-sampling sub-network of the feature map extraction sub-network to obtain the document feature map.

Illustratively, the processor is configured to implement a prediction sub-network based on the table detection model, and when determining the prediction information of the document feature map, implement: inputting the document feature map into a first branch sub-network of the prediction sub-network to obtain a first position of a table center point on the document feature map; inputting the document feature map into a second branch sub-network of the prediction sub-network to obtain a first position of a table vertex on the document feature map; inputting the document feature map into a third branch sub-network of the prediction sub-network to obtain the offset of the projection position corresponding to the form center point on the document image; and inputting the document feature map into a fourth branch sub-network of the prediction sub-network to obtain the offset of the projection position corresponding to the table vertex on the document image.

In some embodiments, the processor is configured to implement the preset-based location determination rule, and when determining the second location of the table keypoint on the document image according to the prediction information, implement: converting the first position of the key point of the table according to the ratio of the first size to the second size to obtain the projection position of the first position on the document image; and determining a second position of the table key point on the document image according to the projection position corresponding to the table key point and the position offset of the projection position.

In some other embodiments, the prediction information further includes a displacement between a table vertex and a table center point on the document image, and the processor is configured to implement the determining rule based on a preset position, and when determining a second position of the table key point on the document image according to the prediction information, implement converting the first positions of all the table key points according to a ratio of the first size to the second size to obtain projection positions of all the first positions on the document image; determining a second position of a form center point on the document image according to a projection position corresponding to the form center point on the document feature map and the position offset of the projection position; determining a first candidate position of a table vertex on the document image according to a projection position corresponding to the table vertex on the document feature map and the position offset of the projection position; obtaining a second candidate position of the table vertex on the document image according to the projection position of the table center point and the displacement between the table vertex and the table center point on the document image; and determining a second position of the table vertex according to the first candidate position and the second candidate position.

In some embodiments, the prediction information further includes a width of the table and a height of the table on the document image, and the processing is configured to implement, when determining the second position of the table vertex according to the first candidate position and the second candidate position: determining a reference frame of the form on the document image according to the second position of the form center point, the width of the form on the document image and the height of the form; if all the first candidate positions of the table vertices are within the reference frame, determining the first candidate positions of the table vertices as second positions of the table vertices; and if the first candidate position of the table vertex is outside the reference frame, determining that the second candidate position of the table vertex is the second position of the table vertex.

Illustratively, the processor is further configured to implement: acquiring a plurality of training samples, wherein at least one part of the training samples are document images containing tables, and the document images containing the tables are provided with labeling information of key points of the tables; obtaining a standard second position of the table key point according to the marking information of the table key point; converting the second position of the standard of the table key point according to the ratio of the second size to the first size to obtain the first position of the standard of the table key point; converting the standard first position of the key point of the table according to the ratio of the first size to the second size to correspondingly obtain the standard projection position; determining the position offset of the standard projection position according to the standard second position of the table key point and the standard projection position corresponding to the table key point; determining labels of all training samples, wherein the labels of the document image containing the table at least comprise a first position of the standard and a position correction amount of a projection position of the standard; extracting a sub-network based on the feature map of the form detection model, and extracting a document feature map of a training sample; based on a prediction subnetwork of the form detection model, determining prediction information of the training sample according to a document feature map of the training sample; and adjusting network parameters of the form detection model according to errors of the labels of the training samples and the prediction information of the training samples.

From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application, such as:

A computer readable storage medium storing a computer program, where the computer program includes program instructions, and the processor executes the program instructions to implement any of the table detection methods based on the detection model provided by the embodiments of the present application.

The computer readable storage medium may be an internal storage unit of the computer device according to the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like, which are provided on the computer device.

While the application has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. A method for table detection based on a detection model, the method comprising:

Determining a form area in the document image according to the second positions of the form key points;

the prediction information also comprises displacement between a table vertex and a table center point on the document image;

The determining, based on a preset position determining rule, a second position of the table key point on the document image according to the prediction information, including:

converting the first positions of all the form key points according to the ratio of the first size to the second size to obtain projection positions of all the first positions on the document image;

determining a second position of a form center point on the document image according to a projection position corresponding to the form center point on the document feature map and the position offset of the projection position;

determining a first candidate position of a table vertex on the document image according to a projection position corresponding to the table vertex on the document feature map and the position offset of the projection position;

obtaining a second candidate position of the table vertex on the document image according to the projection position of the table center point and the displacement between the table vertex and the table center point on the document image;

And determining a second position of the table vertex according to the first candidate position and the second candidate position.

2. The form detection method based on the detection model according to claim 1, wherein the form detection model-based feature map extraction sub-network extracts a document feature map for the document image, comprising:

Inputting the document image into a main sub-network of the feature map extraction sub-network to obtain a primary feature map;

and inputting the primary feature map into an up-sampling sub-network of the feature map extraction sub-network to obtain the document feature map.

3. The form inspection method based on the inspection model of claim 1, wherein the determining the prediction information of the document feature map based on the prediction sub-network of the form inspection model comprises:

Inputting the document feature map into a first branch sub-network of the prediction sub-network to obtain a first position of a table center point on the document feature map;

inputting the document feature map into a second branch sub-network of the prediction sub-network to obtain a first position of a table vertex on the document feature map;

Inputting the document feature map into a third branch sub-network of the prediction sub-network to obtain the offset of the projection position corresponding to the form center point on the document image;

and inputting the document feature map into a fourth branch sub-network of the prediction sub-network to obtain the offset of the projection position corresponding to the table vertex on the document image.

4. A form inspection method based on inspection model according to any one of claims 1-3, wherein determining a second location of the form keypoints on the document image based on the prediction information based on preset location determination rules comprises:

converting the first position of the key point of the table according to the ratio of the first size to the second size to obtain the projection position of the first position on the document image;

and determining a second position of the table key point on the document image according to the projection position corresponding to the table key point and the position offset of the projection position.

5. The detection model-based table detection method according to claim 1, wherein:

the prediction information also comprises the width of the table and the height of the table on the document image;

The determining a second location of the table vertex from the first candidate location and the second candidate location includes:

determining a reference frame of the form on the document image according to the second position of the form center point, the width of the form on the document image and the height of the form;

if all the first candidate positions of the table vertices are within the reference frame, determining the first candidate positions of the table vertices as second positions of the table vertices;

And if the first candidate position of the table vertex is outside the reference frame, determining that the second candidate position of the table vertex is the second position of the table vertex.

6. A test model based form test method according to any one of claims 1-3, further comprising:

acquiring a plurality of training samples, wherein at least one part of the training samples are document images containing tables, and the document images containing the tables are provided with labeling information of key points of the tables;

obtaining a standard second position of the table key point according to the marking information of the table key point;

Converting the second position of the standard of the table key point according to the ratio of the second size to the first size to obtain the first position of the standard of the table key point;

Converting the standard first position of the key point of the table according to the ratio of the first size to the second size to correspondingly obtain the standard projection position;

determining the position offset of the standard projection position according to the standard second position of the table key point and the standard projection position corresponding to the table key point;

Determining labels of all training samples, wherein the labels of the document image containing the table at least comprise a first position of the standard and a position correction amount of a projection position of the standard;

Extracting a sub-network based on the feature map of the form detection model, and extracting a document feature map of a training sample;

Based on a prediction subnetwork of the form detection model, determining prediction information of the training sample according to a document feature map of the training sample;

and adjusting network parameters of the form detection model according to errors of the labels of the training samples and the prediction information of the training samples.

7. A form inspection apparatus based on an inspection model, the apparatus comprising:

The image acquisition module is used for acquiring a document image, wherein the size of the document image is a preset first size;

A form determining module, configured to determine a form area in the document image according to second positions of a plurality of form keypoints;

The key point determining module specifically comprises a projection position determining unit, a table center point position determining unit, a table vertex first candidate position determining unit, a table vertex second candidate position determining unit and a table vertex position determining unit;

8. A computer device, the computer device comprising a memory and a processor;

The memory is used for storing a computer program;

The processor being configured to execute the computer program and to implement the detection model-based form detection method according to any one of claims 1 to 6 when the computer program is executed.

9. A computer readable storage medium storing a computer program, wherein the computer program, if executed by a processor, implements the detection model-based form detection method according to any one of claims 1-6.