CN113592876B

CN113592876B - Training method, device, computer equipment and storage medium for split network

Info

Publication number: CN113592876B
Application number: CN202110049449.7A
Authority: CN
Inventors: 胡一凡
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-01-14
Filing date: 2021-01-14
Publication date: 2024-09-06
Anticipated expiration: 2041-01-14
Also published as: CN113592876A

Abstract

The application relates to a training method, a training device, computer equipment and a storage medium for a segmentation network. The method comprises the following steps: acquiring a sample image comprising a target geometry and a corresponding labeling geometry region; performing target segmentation processing on the sample image through a segmentation network to be trained to obtain a predicted geometric area corresponding to the target geometric figure and corresponding predicted vertex information; determining corresponding regional characteristic loss according to the predicted geometric region and the marked geometric region; determining a corresponding geometric feature loss according to the predicted geometric region and the predicted polygonal region determined by the predicted vertex information; training the segmentation network to be trained based on the target loss function constructed by the regional characteristic loss and the geometric characteristic loss until the training stopping condition is reached, and obtaining a trained target segmentation network; the object segmentation network is used for segmenting an object geometric figure from an image to be processed. The method can be used for dividing the network with the accuracy and precision.

Description

Training method, device, computer equipment and storage medium for split network

Technical Field

The present application relates to the field of computer technologies, and in particular, to a training method and apparatus for a split network, a computer device, and a storage medium.

Background

With the development of computer technology, deep learning is widely used in various fields, such as image recognition, image segmentation, and the like, by deep learning. The partial region or partial information required by the user can be segmented from the image through image recognition and image segmentation.

However, image segmentation is a prediction at one pixel level, often requiring multiple downsampling and upsampling, and the use of multi-scale information in the recognition, segmentation process. The parameters are reduced in the conventional manner by reducing the up-down sampling times and the number of channels, but the accuracy of image segmentation is easily reduced.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a training method, apparatus, computer device, and storage medium for a segmentation network that can improve the accuracy and precision of image segmentation.

A method of training a segmentation network, the method comprising:

Acquiring a sample image comprising a target geometric figure, and determining a labeling geometric region corresponding to the target geometric figure based on the sample image;

Performing target segmentation processing on the sample image through a segmentation network to be trained to obtain a predicted geometric area corresponding to the target geometric figure, and determining predicted vertex information corresponding to the target geometric figure based on image characteristics in the target segmentation processing;

Determining corresponding regional characteristic loss according to the prediction geometric region and the labeling geometric region;

determining a corresponding geometric feature loss according to the predicted geometric region and the predicted polygonal region determined by the predicted vertex information;

constructing an objective loss function based on the region feature loss and the geometric feature loss;

Training the to-be-trained segmented network through the target loss function until the training stopping condition is reached, and obtaining a trained target segmented network; the object segmentation network is used for segmenting an object geometric figure from the image to be processed.

In one embodiment, the target vertex information includes target vertex coordinates; the method further comprises the steps of:

Determining a target polygon area formed by the target vertex coordinates;

determining the intersection ratio between the area of the target polygonal area and the area of a preset area;

when the intersection ratio is greater than or equal to a threshold value, a target polygon area formed by the target vertex coordinates is segmented from the image to be processed;

and carrying out corresponding business processing based on the target polygonal area.

A training apparatus to segment a network, the apparatus comprising:

The acquisition module is used for acquiring a sample image comprising a target geometric figure and determining an annotation geometric region corresponding to the target geometric figure based on the sample image;

The prediction module is used for carrying out target segmentation processing on the sample image through a segmentation network to be trained to obtain a prediction geometric area corresponding to the target geometric figure, and determining prediction vertex information corresponding to the target geometric figure based on image characteristics in the target segmentation processing;

The regional characteristic loss determining module is used for determining corresponding regional characteristic loss according to the prediction geometric region and the labeling geometric region;

the geometric feature loss determining module is used for determining corresponding geometric feature loss according to the predicted geometric region and the predicted polygonal region determined by the predicted vertex information;

a building module for building a target loss function based on the regional feature loss and the geometric feature loss;

The training module is used for training the to-be-trained segmented network through the target loss function until the training stopping condition is reached, and obtaining a trained target segmented network; the object segmentation network is used for segmenting an object geometric figure from the image to be processed.

A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

According to the training method, the training device, the computer equipment and the storage medium of the segmentation network, the sample image comprising the target geometric figure is subjected to target segmentation processing through the segmentation network to be trained, and the predicted geometric region, which is predicted by the segmentation network and corresponds to the target geometric figure, can be obtained. Based on the image features in the target segmentation process, the predicted vertex information of the target geometry predicted by the segmentation network in the sample image can be obtained. And constructing a target loss function according to the regional characteristic loss between the predicted geometric region and the labeling geometric region and the geometric characteristic loss between the predicted geometric region and the predicted polygonal region determined by the predicted vertex information, so that the target loss function comprises multiple loss characteristics. The segmentation network to be trained is trained based on the loss in multiple aspects, and the influence of the loss in all aspects on the recognition and segmentation of the segmentation network can be fully considered, so that the recognition and segmentation accuracy of the segmentation network can be improved through training. The target geometric figure can be accurately identified and segmented from the image through the trained target segmentation network. And the vertex information of the target geometric figure in the image can be accurately output, so that the target geometric figure in the image can be accurately positioned.

Drawings

FIG. 1 is an application environment diagram of a training method for a split network in one embodiment;

FIG. 2 is a flow chart of a training method for a split network in one embodiment;

FIG. 3 is a flowchart illustrating a step of determining predicted vertex information corresponding to a target geometry in one embodiment;

FIG. 4 is a flowchart illustrating steps for determining corresponding region feature loss based on a predicted geometric region and a labeled geometric region in one embodiment;

FIG. 5 is a flowchart illustrating steps for determining a first color value corresponding to a predicted geometric region and a second color value corresponding to a labeling geometric region according to an embodiment;

FIG. 6 is a flowchart illustrating steps for determining corresponding geometric feature loss based on a predicted geometric region and a predicted polygon region determined from predicted vertex information, according to one embodiment;

FIG. 7 is a flow diagram of a test of a split network in one embodiment;

FIG. 8 is a training architecture diagram of a split network in one embodiment;

FIG. 9 is a flow diagram of an application of a split network in one embodiment;

FIG. 10 is a block diagram of a training apparatus for a split network in one embodiment;

FIG. 11 is an internal block diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The present application relates to the field of artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) technology, wherein artificial intelligence is a theory, method, technique and application system that utilizes a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The scheme provided by the embodiment of the application relates to a training method of an artificial intelligence segmentation network, and specifically is explained through the following embodiments.

The training method of the segmentation network provided by the application can be applied to the segmentation network training system shown in figure 1. As shown in fig. 1, the split network training system includes a terminal 110 and a server 120. In one embodiment, the terminal 110 and the server 120 may each independently perform the training method for the split network provided in the embodiment of the present application. The terminal 110 and the server 120 may also cooperate to perform the training method of the split network provided in the embodiment of the present application. When the terminal 110 and the server 120 cooperate to perform the training method of the segmentation network provided in the embodiment of the present application, the terminal 110 acquires a sample image including the target geometry, and determines a labeling geometrical region corresponding to the target geometry based on the sample image. The terminal 110 sends the sample image and the corresponding labeling geometry region to the server 120. The server 120 performs a target segmentation process on the sample image through a segmentation network to be trained to obtain a predicted geometric region corresponding to the target geometric figure, and determines predicted vertex information corresponding to the target geometric figure based on image features in the target segmentation process. The server 120 determines corresponding region feature loss from the predicted geometric region and the labeled geometric region. The server 120 determines a corresponding geometric feature loss from the predicted geometric region and the predicted polygon region determined from the predicted vertex information. The server 120 builds an objective loss function based on the region feature loss and the geometric feature loss. The server 120 trains the segmentation network to be trained through the target loss function until reaching the training stopping condition, and obtains a trained target segmentation network; the object segmentation network is used for segmenting an object geometric figure from an image to be processed.

The terminal 110 uploads the image to be processed to the server 120, and the server 120 performs target segmentation processing on the image to be processed through the trained segmentation network to obtain target geometric figure and target vertex information in the image to be processed. The server 120 returns the target geometry and target vertex information to the terminal 110. The sample image and the image to be processed are shown at 112 in fig. 1 and the target geometry is shown at 114 in fig. 1.

The terminal 110 may be, but not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices, and the server 120 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. The terminal 110 and the server 120 may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.

In one embodiment, as shown in fig. 2, a training method for splitting a network is provided, which is illustrated by using a computer device (the computer device may be a terminal or a server in fig. 1 specifically) as an example, and includes the following steps:

Step S202, a sample image comprising the target geometry is acquired, and a labeling geometry region corresponding to the target geometry is determined based on the sample image.

The target geometric figure refers to a vector diagram formed by external contour lines of a target area in the sample image, and the target area refers to an area of interest of a user. Such as, but not limited to, rectangular, trapezoidal, circular, oval, polygonal, and documents, sheets, etc. having geometric shapes as the target geometry. The sample image may be an image containing an identity document, and the target area is the area where the identity document is located, and the target geometric figure is a figure formed by the external contour lines of the identity document. The identity document is a document for representing the identity of a user, such as a resident identity card, a harbor and australia pass or a temporary residence card corresponding to different areas.

The labeling geometric region refers to a region where a target geometric figure labeled in advance is located. The labeling geometric region may be a region where a target geometric figure labeled in advance in the sample image is located, or may be a mask image corresponding to the sample image labeled in advance. The mask image is an image filter template for identifying the target geometric figure in the image, and can shade other parts of the image and screen the target geometric figure in the image. For example, binarization processing is performed on the sample image to represent each pixel point of the region where the target geometry exists in the sample image with 1, and each pixel point of the region where the non-target geometry exists with 0, so as to obtain the mask image. The region formed by each pixel point denoted by 1 is the labeling geometric region.

Specifically, the computer device acquires a sample image including the target geometry, and the target geometry is marked in the sample image in advance. Or the computer equipment acquires a sample image comprising the target geometric figure and acquires a corresponding mask image, wherein the mask image marks the area where the target geometric figure is located.

For example, the sample image is an image containing the target certificate, the area where the target certificate is located in the sample image is marked in advance, and the area where the target certificate is located is the marked geometric area. Or performing binarization processing on the image containing the target certificate to obtain a corresponding mask map, wherein the pixel points of the region where the target certificate is located in the mask map are represented by 1, and the pixel points of the region where the non-target certificate is located are represented by 0, so that the region where the target certificate is located in the mask map is taken as a labeling geometric region.

Step S204, performing target segmentation processing on the sample image through a segmentation network to be trained to obtain a predicted geometric area corresponding to the target geometric figure, and determining predicted vertex information corresponding to the target geometric figure based on image characteristics in the target segmentation processing.

The predicted vertex information refers to key points predicted by the segmentation network and forming the external contour of the target geometric figure in the sample image. The predicted vertex information may include vertex positions, which may include vertex coordinates, vertex numbers, vertex pixels, and the like. The predicted vertex information can determine the shape and size of the graph.

In particular, when the labeling geometric region is included in the sample image, the computer device inputs the sample image into the segmentation network to be trained. When the target geometric figure is not marked in the sample image, the computer equipment department inputs the sample image comprising the target geometric figure and the marked geometric region corresponding to the target geometric figure into a segmentation network to be trained.

And extracting features of the sample image by the segmentation network to be trained to obtain a corresponding feature map, and predicting the region where the target geometric figure in the sample image is located based on the feature map to obtain a predicted geometric region. Further, the segmentation network to be trained segments the predicted geometric region from the sample image.

And predicting vertex information of an external contour forming the target geometric figure in the sample image based on the extracted feature map by the segmentation network to be trained to obtain predicted vertex information.

Step S206, determining the corresponding regional characteristic loss according to the predicted geometric region and the labeling geometric region.

Wherein the region feature loss comprises at least one of a region color loss between the predicted geometric region and the labeled geometric region, and a region segmentation loss between the predicted geometric region and the labeled geometric region.

In particular, the computer device may determine a color of the predicted geometric region and a color of the labeling geometric region, and determine a region color loss between the predicted geometric region and the labeling geometric region based on the color of the predicted geometric region and the color of the labeling geometric region. The computer device treats the region color loss as a region feature loss between the predicted geometric region and the labeled geometric region.

In one embodiment, the computer device may obtain a first region area corresponding to the predicted geometric region and a second region area corresponding to the labeled geometric region, calculate a region segmentation loss between the predicted geometric region and the labeled geometric region based on the first region area and the second region area. The computer device treats the region segmentation loss as a region feature loss between the predicted geometric region and the labeled geometric region.

In one embodiment, the computer device treats the region color loss and the region segmentation loss between the predicted geometric region and the labeled geometric region as a region feature loss between the predicted geometric region and the labeled geometric region.

Step S208, corresponding geometric feature loss is determined according to the predicted geometric region and the predicted polygon region determined by the predicted vertex information.

Wherein the geometric feature loss comprises at least one of a geometric area loss between the predicted geometric region and the predicted polygonal region, and a geometric center of gravity loss between the predicted geometric region and the predicted polygonal region.

Specifically, the computer device determines a predicted polygon area in the sample image based on the predicted vertex information. The computer device may determine a third region area corresponding to the predicted polygonal region, and calculate a geometric area loss between the predicted geometric region and the predicted polygonal region based on the first region area and the third region area. The computer device may treat the geometric area loss as a geometric feature loss between the predicted geometric region and the predicted polygonal region.

In one embodiment, the computer device may determine a second centroid position corresponding to the predicted polygon area based on the predicted vertex information and determine a first centroid position of the predicted geometric area. The computer device determines a geometric center of gravity loss between the predicted geometric region and the predicted polygonal region based on the second center of gravity position and the first center of gravity position. The computer device may treat the geometric center of gravity loss as a geometric feature loss between the predicted geometric region and the predicted polygonal region.

In one embodiment, the computer device may lose geometric area and geometric center of gravity between the predicted geometric region and the predicted polygonal region, and geometric features between the predicted geometric region and the predicted polygonal region.

Step S210, constructing an objective loss function based on the regional characteristic loss and the geometric characteristic loss.

Specifically, the computer device may obtain weights corresponding to the regional feature loss and the geometric feature loss, respectively, and construct a target loss function according to the regional feature loss, the geometric feature loss, and the corresponding weights.

In one embodiment, the computer device performs a weighted calculation of the regional feature loss and the geometric feature loss to obtain the objective loss function. Or the computer device may also multiply or log the regional and geometric feature losses, or perform other mathematical operations, etc., to arrive at the target loss function.

Step S212, training the segmented network to be trained through the target loss function until reaching the training stopping condition, and obtaining a trained target segmented network; the object segmentation network is used for segmenting an object geometric figure from an image to be processed.

Specifically, the computer device may train the segmentation network to be trained through the objective loss function, adjust parameters of the segmentation network during the training process, and continue the training until the segmentation network meets the training stop condition, thereby obtaining a trained objective segmentation network.

In this embodiment, the training stop condition may be at least one of a loss value of the split network being less than or equal to a loss threshold, reaching a preset iteration number, reaching a preset iteration time, and the split performance of the network reaching a preset index.

For example, a loss value generated in each training is calculated through the target loss function, parameters of the segmentation network are adjusted based on the difference between the loss value and the loss threshold value, and training is continued until training is stopped, so that a trained target segmentation network is obtained.

And the terminal calculates the iteration times of the segmentation network in the training process, and stops training when the iteration times of the terminal in the training process reach the preset iteration times, so as to obtain the trained segmentation network.

In one embodiment, each training of the segmentation network may use a preset number of sample images, e.g., 32 sample images. During the training process, the computer device may update parameters of the segmentation network based on Adam's gradient descent method, the initial learning rate of the segmentation network being set to 0.05, betas = (0.95,0.9995) in Adam. The classification of each pixel can be predicted through the segmentation network to obtain a predicted geometric area, and the size of the predicted geometric area is the same as that of the target geometric figure. And, the segmentation network may output predicted vertex information corresponding to the target geometry. And calculating an error gradient based on the target loss function and updating the gradient of the segmentation network through back propagation, thereby obtaining the trained target segmentation network.

According to the training method of the segmentation network, the sample image comprising the target geometric figure is subjected to target segmentation processing through the segmentation network to be trained, so that the predicted geometric region, which is predicted by the segmentation network and corresponds to the target geometric figure, can be obtained. Based on the image features in the target segmentation process, the predicted vertex information of the target geometry predicted by the segmentation network in the sample image can be obtained. And constructing a target loss function according to the regional characteristic loss between the predicted geometric region and the labeling geometric region and the geometric characteristic loss between the predicted geometric region and the predicted polygonal region determined by the predicted vertex information, so that the target loss function comprises multiple loss characteristics. The segmentation network to be trained is trained based on the loss in multiple aspects, and the influence of the loss in all aspects on the recognition and segmentation of the segmentation network can be fully considered, so that the recognition and segmentation accuracy of the segmentation network can be improved through training. The target geometric figure can be accurately identified and segmented from the image through the trained target segmentation network. And the vertex information of the target geometric figure in the image can be accurately output, so that the target geometric figure in the image can be accurately positioned.

In one embodiment, as shown in fig. 3, performing a target segmentation process on a sample image through a segmentation network to be trained to obtain a predicted geometric area corresponding to a target geometric figure, and determining predicted vertex information corresponding to the target geometric figure based on image features in the target segmentation process, including:

step S302, sampling processing is carried out on the sample image, and a corresponding sample sampling image is obtained.

Wherein the sampling process includes an up-sampling process and a down-sampling process. The up-sampling process refers to a process of amplifying a sample image by image interpolation, and the amplified image has a higher resolution. Downsampling, also referred to as downsampling, refers to scaling an image to obtain a desired image resolution. The sample image refers to an image obtained by an up-sampling process or a down-sampling process, and includes a sample up-sampling image and a sample down-sampling image. When the sample image is subjected to downsampling processing, the sample sampling image is a sample downsampling image, and when the sample image is subjected to upsampling processing, the sample sampling image is a sample upsampling image.

Specifically, the computer device performs downsampling processing on the sample image to obtain a corresponding sample downsampled image. Or the computer equipment performs up-sampling processing on the sample image to obtain a corresponding sample up-sampling image.

In one embodiment, the computer device may sample the sample image through the segmentation network to be trained to obtain a corresponding sample image. Further, up-sampling processing or down-sampling processing is performed on the sample image through the segmentation network to be trained, so that a corresponding sample up-sampling image or sample down-sampling image is obtained.

Step S304, feature extraction is carried out on the sample image and the sample sampling image through a segmentation network to be trained, and a first feature image corresponding to the sample image and a second feature image corresponding to the sample sampling image are obtained.

Specifically, the segmentation network to be trained performs multi-layer feature extraction on the sample image to obtain a first feature map corresponding to the sample image. Further, the segmentation network to be trained performs feature extraction on the sample image through the multi-layer convolution layers, and takes the first feature image output by the previous convolution layer as the input of the next convolution layer to obtain the first feature image output by each layer of convolution layer.

And carrying out multi-layer feature extraction on the sample sampling image by the segmentation network to be trained to obtain a second feature map corresponding to the sample sampling image. Further, the segmentation network to be trained performs feature extraction on the sample sampled image through the multi-layer convolution layers, and takes the second feature image output by the previous convolution layer as the input of the next convolution layer to obtain the second feature image output by each layer of convolution layer.

In one embodiment, the partitioning network to be trained includes two encoders, each of which includes multiple convolutional layers. The segmentation network to be trained inputs the sample image and the sample sampling image into two encoders respectively, and performs feature extraction on the sample image and the sample sampling image respectively to obtain a corresponding first feature map and a corresponding second feature map.

Step S306, fusion processing is carried out on the first feature map and the second feature map, and a predicted geometric area corresponding to the target geometric figure is obtained based on the fused sample fusion feature map after the fusion processing.

Specifically, the segmentation network to be trained fuses the first feature map and the second feature map to obtain a sample fusion feature map. And carrying out 1*1 convolution processing on the sample fusion feature map by the segmentation network to be trained, and carrying out sampling processing on the feature map subjected to 1*1 convolution processing to obtain a prediction geometric region in the sample image.

In this embodiment, the fusing processing of the first feature map and the second feature map may be a stitching processing or a stacking processing of the first feature map and the second feature map. The splicing processing refers to splicing the first feature map and the second feature map to obtain a feature map. The superposition processing refers to summing the gray values of corresponding pixel points in the first feature map and the second feature map to obtain an average value, so as to obtain a feature map.

It will be appreciated that the sample process after 1*1 convolutions is an up-sampling process, as opposed to the sample process when the sample image is obtained, e.g., the sample image is down-sampled to obtain the sample image, and the sample process after 1*1 convolutions is an up-sampling process. And (3) up-sampling the sample image to obtain a sample image, and performing sampling processing after 1*1 convolution processing to obtain down-sampling processing.

And step S308, performing convolution and full connection processing on the second feature map to obtain predicted vertex information corresponding to the target geometric figure.

Specifically, the segmentation network to be trained carries out 1*1 convolution processing on the second feature map output by the last convolution layer, and outputs the feature map obtained by 1*1 convolution processing to the full-connection layer. And carrying out full connection processing on the feature images through the full connection layer to obtain the predicted vertex information corresponding to the target geometric figure in the sample sampling image.

In one embodiment, the segmentation network to be trained may perform convolution and full connection processing on the first feature map to obtain predicted vertex information corresponding to the target geometry.

In one embodiment, the predicted vertex information comprises predicted vertex coordinates. And the segmentation network to be trained carries out convolution and full connection processing on the first feature map or the second feature map output by the last convolution layer to obtain the predicted vertex coordinates corresponding to the target geometric figure.

In this embodiment, the sample image is sampled to obtain a sample image, and the sample image and the sample image are respectively subjected to feature extraction through the to-be-trained segmentation network, so that features of images with different resolutions can be obtained. The fusion processing is carried out based on the extracted first feature image and the second feature image, so that the prediction geometric area where the target geometric figure is located in the sample image can be predicted more accurately based on the image features with different resolutions. And outputting predicted vertex information corresponding to the target geometric figure based on the full-connection processing of the second feature graph, so that the segmentation network is trained based on the predicted geometric region and the predicted vertex information obtained by different processing.

In one embodiment, as shown in fig. 4, determining the corresponding region feature loss from the predicted geometric region and the labeled geometric region includes:

step S402, a first color value corresponding to the predicted geometric region and a second color value corresponding to the labeling geometric region are determined.

Specifically, the computer device may determine each pixel point in the prediction geometry region, and obtain a first gray value corresponding to each pixel point. The computer equipment obtains the color value corresponding to each pixel point, and calculates a first color value corresponding to the prediction geometric area based on the color value and the first gray value corresponding to each pixel point.

The computer equipment obtains second gray values respectively corresponding to the pixel points in the labeling geometric area, and obtains color values respectively corresponding to the pixel points. The computer equipment obtains the color value corresponding to each pixel point, and calculates a second color value corresponding to the labeling geometric area based on the color value and the second gray value corresponding to each pixel point.

Step S404, determining the regional color loss between the predicted geometric region and the labeling geometric region based on the difference between the first color value and the second color value.

Wherein the difference between the first color value and the second color value may be characterized by a contrast value between the first color value and the second color value, the contrast value may be at least one of a difference value, an absolute value of the difference value, a logarithm of the difference value, a ratio, and a percentage.

In particular, the computer device calculates a difference between the first color value and the second color value, determining the difference between the first color value and the second color value as a region color loss between the predicted geometric region and the labeling geometric region. Further, the difference between the first color value and the second color value may be a contrast value between the first color value and the second color value.

In one embodiment, the contrast value refers to the difference between the first color value and the second color value. The computer device calculates a difference between the first color value and the second color value, and takes the difference as a region color loss between the predicted geometric region and the labeled geometric region.

In one embodiment, the contrast value refers to the absolute value of the difference between the first color value and the second color value. The computer device calculates a difference between the first color value and the second color value and takes an absolute value of the difference as a region color loss between the predicted geometric region and the labeled geometric region. The region color loss between the predicted geometry region and the labeled geometry region is calculated, for example, by the following formula:

Loss₃＝|C₀-C| (1)

Where Loss ₁ is the area color Loss. C ₀ is the second color value corresponding to the labeling geometric region, and C is the first color value corresponding to the prediction geometric region.

In one embodiment, the contrast value refers to a ratio between the first color value and the second color value. The computer device calculates a ratio between the first color value and the second color value as a region color loss between the predicted geometric region and the labeled geometric region.

Step S406, determining a first area corresponding to the predicted geometric area and a second area corresponding to the labeling geometric area.

Specifically, the computer device may obtain first gray values corresponding to each pixel point in the prediction geometry region, and calculate a first region area corresponding to the prediction geometry region based on the first gray values corresponding to each pixel point.

The computer equipment obtains second gray values corresponding to the pixel points in the labeling geometric area, and calculates the second area corresponding to the labeling geometric area based on the second gray values corresponding to the pixel points.

Step S408, determining a region segmentation loss between the prediction geometry region and the labeling geometry region based on the first region area and the second region area.

In particular, the computer device may determine an area product of the first area and the second area, and a sum of the areas of the first area and the second area, and determine a region segmentation loss between the predicted geometric region and the labeled geometric region based on the area product and the sum of the areas. Further, the computer device may treat the ratio of the area product to the sum of the areas as a region segmentation loss between the predicted geometric region and the labeled geometric region.

In one embodiment, the computer device may take the ratio of the area product of the preset multiple to the sum of the areas as the region segmentation loss between the predicted geometric region and the labeled geometric region. For example, dividing the product of two times the area by the sum of the areas yields the area division loss.

The region segmentation loss between the predicted geometry region and the labeled geometry region is calculated, for example, by the following formula:

Wherein GT _i,j (x) is the gray value of the pixel point of the ith row and the jth column of the labeling geometric region. Σ _i,jGT_i,j (x) is the second area of the labeling geometric region. Σ _i,jI_i,j (x) is the first region area of the prediction geometry region.

Step S410, determining corresponding regional characteristic loss according to regional color loss and regional segmentation loss.

Specifically, the computer device may obtain weights corresponding to the region color loss and the region segmentation loss, and perform weighted summation processing on the region color loss and the region segmentation loss to obtain corresponding region feature loss.

In one embodiment, the computer device may sum the region color loss and the region segmentation loss to yield corresponding region feature loss.

In this embodiment, the color loss of the region is determined based on the color difference between the predicted geometric region and the labeling geometric region, so that the color loss between the geometric region predicted by the segmentation network and the real geometric region can be determined. Based on the area of the predicted geometric area and the area of the marked geometric area, the segmentation difference between the geometric area and the real geometric area obtained by the prediction of the segmentation network can be determined, so that the color loss and the segmentation loss of the area are used as conditions for the training of the segmentation network, and the trained segmentation network has higher accuracy and segmentation precision.

In one embodiment, as shown in fig. 5, determining the first color value corresponding to the predicted geometric region and the second color value corresponding to the labeling geometric region includes:

Step S502, obtaining the values of the color channels corresponding to the pixel points in the prediction geometric area, and obtaining the values of the color channels corresponding to the pixel points in the labeling geometric area.

Wherein the color channels refer to the red, green and blue channels of the image. The Red channel is the R (Red) channel, the Green channel is the G (Green) channel, and the Blue channel is the B (Blue) channel.

Specifically, the computer device may obtain each pixel point in the prediction geometry region, and for each pixel point, the computer device may obtain a value of a red channel, a value of a green channel, and a value of a blue channel corresponding to the pixel point.

The computer device may obtain each pixel in the labeling geometry, and for each pixel, the computer device may obtain a value of a red channel, a value of a green channel, and a value of a blue channel corresponding to the pixel.

Step S504, determining a first color value corresponding to the prediction geometric region according to the value of the color channel corresponding to each pixel point in the prediction geometric region, the first gray value corresponding to the corresponding pixel point and the first region area.

Specifically, the computer device obtains a first gray value corresponding to each pixel point in the prediction geometric region, and a first region area corresponding to the prediction geometric region. For each pixel point, the computer equipment calculates the product of the value of each color channel and the first gray value of the corresponding pixel point to obtain the first product corresponding to each color channel.

The computer device calculates the sum of the first products of the same color channel, i.e. the sum of the first products corresponding to each red color channel, the sum of the first products corresponding to each green color channel, and the sum of the first products corresponding to each blue color channel. The computer equipment calculates a first ratio of the sum of the first products of the same color channels and the first area, and takes the first ratio as the color value of the color channel, thereby obtaining the color value corresponding to the red channel, the color value corresponding to the green channel and the color value corresponding to the blue channel.

The computer device calculates a first color value corresponding to the predicted geometric region according to the color value corresponding to each color channel. Further, the computer device sums up the color values corresponding to each color channel and takes the average value as a first color value corresponding to the prediction geometry region.

For example, the computer device may calculate a first color value corresponding to the predicted geometric region according to the following formula,

Wherein, C is the first color value corresponding to the prediction geometric region. R _i,j (x) is the value of the pixel point of the ith row and the jth column in the prediction geometric area in the red channel, G _i,j (x) is the value of the pixel point of the ith row and the jth column in the prediction geometric area in the green channel, and B _i,j (x) is the value of the pixel point of the ith row and the jth column in the prediction geometric area in the blue channel. I _i,j (x) is the gray value corresponding to the pixel point of the ith row and the jth column in the prediction geometry region. Σ _i,jI_i,j (x) is the first area corresponding to the predicted geometric area. color_r ₁ is the value of the pixel in the predicted geometric region in the red channel, color_g ₁ is the value of the pixel in the predicted geometric region in the green channel, and color_b ₁ is the value of the pixel in the predicted geometric region in the blue channel.

Step S506, determining a second color value corresponding to the labeling geometric region according to the value of the color channel corresponding to each pixel point in the labeling geometric region, the second gray value corresponding to the corresponding pixel point and the second region area.

Specifically, the computer equipment obtains a second gray value corresponding to each pixel point in the labeling geometric region and a second region area corresponding to the labeling geometric region. For each pixel point, the computer equipment calculates the product of the value of each color channel and the second gray value of the corresponding pixel point to obtain a second product corresponding to each color channel.

The computer device calculates the sum of the second products of the same color channel, i.e. the sum of the second products corresponding to each red color channel, the sum of the second products corresponding to each green color channel, and the sum of the second products corresponding to each blue color channel. The computer equipment calculates a second ratio of the sum of second products of the same color channel and the second area, and takes the second ratio as a color value of the color channel, thereby obtaining a color value corresponding to the red channel, a color value corresponding to the green channel and a color value corresponding to the blue channel.

And the computer equipment calculates a second color value corresponding to the labeling geometric region according to the color value corresponding to each color channel. Further, the computer device sums up the color values corresponding to each color channel and takes the average value as a second color value corresponding to the labeling geometric region.

For example, the computer device may calculate a second color value corresponding to the labeling geometry region according to the following equation,

Wherein C is a second color value corresponding to the labeling geometric region. R _i,j (y) is the value of the pixel point of the ith row and the jth column in the labeling geometric region in the red channel, G _i,j (y) is the value of the pixel point of the ith row and the jth column in the labeling geometric region in the green channel, and B _i,j (y) is the value of the pixel point of the ith row and the jth column in the labeling geometric region in the blue channel. GT _i,j (x) is the gray value corresponding to the pixel point of the ith row and the jth column in the labeling geometric region. Σ _i,jGT_i,j (x) is the second area corresponding to the labeling geometric area. color_r ₂ is the value of the pixel in the labeling geometry in the red channel, color_g ₂ is the value of the pixel in the labeling geometry in the green channel, and color_b ₂ is the value of the pixel in the labeling geometry in the blue channel.

In this embodiment, based on the values of the color channels corresponding to the pixel points in the prediction geometry region, the average color value of the prediction geometry region can be accurately calculated according to the first gray value and the first region area corresponding to the corresponding pixel point. Based on the values of the color channels corresponding to the pixel points in the labeling geometric area, the second gray value corresponding to the corresponding pixel point and the second area can accurately calculate the average color value of the labeling geometric area. Based on the difference between the average color value of the predicted geometric area and the average color value of the labeling geometric area, the color loss between the predicted geometric area and the real geometric area predicted by the segmentation network can be accurately determined, so that the color loss is used as a training condition of the segmentation network, and the precision of the segmentation network can be improved.

In one embodiment, determining a first region area corresponding to the predicted geometric region and a second region area corresponding to the labeled geometric region includes:

Acquiring a first gray value corresponding to each pixel point in the prediction geometric area, and acquiring a second gray value corresponding to each pixel point in the labeling geometric area; taking the sum of the first gray values of all pixel points in the prediction geometric area as a first area of the prediction geometric area; and taking the sum of the second gray values of the pixel points in the labeling geometric area as the second area of the labeling geometric area.

Specifically, the computer device may obtain first gray values corresponding to each pixel point in the prediction geometry region, sum the first gray values corresponding to each pixel point, and use the sum of the first gray values of each pixel point as the first region area of the prediction geometry region.

The computer equipment can acquire second gray values corresponding to the pixel points in the prediction geometric area, sum the second gray values corresponding to the pixel points, and take the sum of the second gray values of the pixel points as the second area of the labeling geometric area.

For example, the computer device may calculate a first region area of the predicted geometric region and a second region area of the labeled geometric region according to the following formula:

S₁＝∑_i,jI_i,j(x) (11)

Wherein S ₁ is an area, for example, a first area and a second area. I _i,j (x) is a gray value of the pixel point of the ith row and the jth column, for example, the gray value of the pixel point of the ith row and the jth column in the geometric region is predicted, and the gray value of the pixel point of the ith row and the jth column in the geometric region is marked.

In this embodiment, the sum of gray values corresponding to the pixel points is used as the area, so that the area of each pixel component of the prediction geometric area can be accurately calculated, and the area of each pixel component of the labeling geometric area can be accurately calculated.

In one embodiment, as shown in fig. 6, determining the corresponding geometric feature loss from the predicted geometric region and the predicted polygon region determined from the predicted vertex information comprises:

Step S602, determining a first area of the predicted geometric region, and calculating a third area of the predicted polygonal region determined by the predicted vertex information.

Specifically, the computer device obtains first gray values corresponding to each pixel point in the prediction geometric area, and takes the sum of the first gray values of each pixel point in the prediction geometric area as the first area of the prediction geometric area.

The computer device may obtain predicted vertex coordinates in the predicted vertex information, determine a predicted polygon in the sample image for which the predicted vertex coordinates are determined. The computer device calculates a third region area of the predicted polygon region based on the predicted vertex coordinates. For example, the computer device may calculate a third region area of the predicted polygonal region according to the shoelace theorem (Shoelace formula):

wherein S ₂ is the third region area, [ P ₁₁,P₁₂],[P₂₁,P₂₂],[P₃₁,P₃₂],[P₄₁,P₄₂ ] is the predicted vertex coordinates.

In step S604, a geometric area loss is determined according to the difference between the first area and the third area.

Wherein the difference between the first area and the third area may be characterized by an area contrast value between the first area and the third area, which may be at least one of a difference value, an absolute value of the difference value, a logarithm of the difference value, a ratio, and a percentage.

Specifically, the computer device calculates a difference between the first region area and the third region area, and determines the difference between the first region area and the third region area as a region color loss between the predicted geometric region and the predicted polygonal region. Further, the difference between the first region area and the third region area may be an area contrast value between the first region area and the third region area.

In one embodiment, the area contrast value refers to the difference between the first area and the third area. The computer device calculates a difference between the first region area and the third region area, and takes the difference as a geometric area loss between the predicted geometric region and the predicted polygonal region.

In one embodiment, the area contrast value refers to the absolute value of the difference between the first area and the third area. The computer device calculates a difference between the first region area and the third region area, and takes an absolute value of the difference as a geometric area loss between the predicted geometric region and the predicted polygonal region. The geometric area loss between the predicted geometric region and the predicted polygon region is calculated, for example, by the following formula:

Loss₁＝|S₁-S₂| (13)

Where Loss ₁ is the geometric area Loss between the predicted geometric region and the predicted polygonal region, S ₁ is the first region area, and S ₂ is the third region area.

In one embodiment, the area-to-area ratio refers to the ratio between the first area and the third area. The computer device calculates a ratio between the first region area and the third region area as a geometric area loss between the predicted geometric area and the predicted polygonal area.

Step S606, a first barycentric location corresponding to the predicted geometric region and a second barycentric location corresponding to the predicted polygonal region are determined.

Specifically, the computer device obtains first gray values corresponding to each pixel point in the prediction geometric area respectively, and determines a first gravity center position of the prediction geometric area according to each first gray value. The computer device determines a second centroid position of the predicted polygon area based on the predicted vertex information.

In step S608, a geometric center of gravity loss is determined based on the distance between the first center of gravity position and the second center of gravity position.

Specifically, the computer device calculates a distance between the first barycentric location and the second barycentric coordinate location, and uses the distance between the first barycentric location and the second barycentric coordinate location as a geometric barycentric loss between the predicted geometric region and the predicted polygon region.

In one embodiment, the computer device may obtain weights corresponding to the first center of gravity position and the second center of gravity position, respectively, and perform weighted summation processing on the first center of gravity position, the second center of gravity position, and the corresponding weights to obtain the geometric center of gravity loss.

In step S610, geometric feature loss is determined based on geometric area loss and geometric center of gravity loss.

Specifically, the computer device may obtain weights corresponding to the geometric area loss and the geometric center of gravity loss, and perform weighted summation processing on the geometric area loss and the geometric center of gravity loss to obtain corresponding geometric feature loss.

In one embodiment, the computer device may sum the geometric area loss and the geometric center of gravity loss to yield corresponding geometric feature losses.

In this embodiment, geometric area loss is determined based on the area difference between the predicted geometric area and the predicted polygonal area, so that the area loss between the geometric area predicted by the segmentation network and the polygonal area can be determined, and large area loss indicates inaccurate prediction and segmentation of the segmentation network. Based on the barycenter position of the predicted geometric region and the barycenter position of the predicted polygonal region, a barycenter loss between the geometric region predicted by the segmentation network and the predicted polygonal region can be determined, and a large barycenter loss indicates inaccurate prediction and segmentation of the segmentation network. Therefore, the area loss and the gravity center loss are used as conditions for training the segmentation network, so that the trained segmentation network has higher accuracy and segmentation precision.

In one embodiment, the first center of gravity position comprises a first center of gravity coordinate and the second center of gravity position comprises a second center of gravity coordinate; determining a geometric center of gravity loss based on a separation between the first center of gravity position and the second center of gravity position, comprising:

Determining a first barycentric coordinate of the prediction geometric region according to first gray values respectively corresponding to all pixel points in the prediction geometric region; determining second center coordinates of a predicted polygon area formed by the predicted vertex coordinates based on the predicted vertex coordinates in the predicted vertex information;

Determining a geometric center of gravity loss based on a separation between the first center of gravity position and the second center of gravity position, comprising: a geometric center of gravity loss between the predicted geometric region and the predicted polygon region is determined based on a distance between the first center of gravity coordinate and the second center of gravity coordinate.

Specifically, the computer device determines the abscissa and the ordinate of the first barycentric coordinate according to the first gray scale value corresponding to each pixel point in the prediction geometric area, so as to obtain the first barycentric coordinate of the prediction geometric area according to the abscissa and the ordinate.

The computer equipment obtains each predicted vertex coordinate in the predicted vertex information, takes the sum of the abscissas of each predicted vertex coordinate as the average value of the sum of the abscissas as the abscissas of the second center coordinate of the predicted polygon area. The computer device averages the sum of the ordinate of each predicted vertex coordinate, and takes the average of the sum of the ordinate as the abscissa of the second center coordinate of the predicted polygon area. The computer device obtains a second concentric coordinate of the polygon area according to the abscissa and the ordinate. For example, the computer device may calculate the second center of gravity coordinates according to the following formula:

Wherein [ P ₁₁,P₁₂],[P₂₁,P₂₂],[P₃₁,P₃₂],[P₄₁,P₄₂ ] is the predicted vertex coordinates and (x ₂,y₂) is the second centroid coordinates.

The computer device calculates a distance between an abscissa in the first barycentric coordinates and an abscissa in the second barycentric coordinates, calculates a distance between an ordinate in the first barycentric coordinates and an ordinate in the second barycentric coordinates, and determines a geometric barycentric loss between the predicted geometric region and the predicted polygonal region from the distance between the abscissas and the distance between the ordinates. Further, the computer device takes the sum of the square of the distance between the abscissas and the square of the distance between the ordinates as the geometric center of gravity loss between the predicted geometric region and the predicted polygon region. For example, the computer device may calculate the geometric center of gravity loss according to the following formula:

Loss₂＝(x₁-x₂)²+(y₁-y₂)² (15)

Where Loss ₂ is the geometric center of gravity Loss, (x ₁,y₁) is the first center of gravity coordinate.

In one embodiment. The computer device may obtain weights corresponding to the first barycentric coordinates and the second barycentric coordinates, multiply the first barycentric coordinates with the corresponding weights, multiply the second barycentric coordinates with the corresponding weights, sum the product of the first barycentric coordinates and the weight with the product of the second barycentric coordinates and the weight, and use the sum as the geometric barycentric loss.

In this embodiment, based on the barycenter coordinates of the predicted geometric area and the barycenter coordinates of the predicted polygonal area, the barycenter loss between the geometric area predicted by the segmentation network and the predicted polygonal area can be determined, so that the barycenter difference between the prediction results corresponding to the two ways is determined when the same segmentation grid predicts the target geometric figure in the two ways, and the barycenter loss is used as the condition for training the segmentation network, so that the trained segmentation network has higher accuracy and segmentation precision. The trained segmentation network can simultaneously have two prediction modes of segmenting the target geometric figure and predicting the vertex coordinates of the target geometric figure.

In one embodiment, determining the first barycentric coordinates of the prediction geometry region according to the first gray values respectively corresponding to the pixel points in the prediction geometry region includes:

Constructing a gray value matrix based on the first gray values respectively corresponding to the pixel points in the prediction geometric region; taking the contrast value of the sum of the first gray values of each row in the gray value matrix and the first area of the prediction geometric area as the abscissa in the first barycentric coordinates of the prediction geometric area; and taking the comparison value of the sum of the first gray values of each column in the gray value matrix and the first area of the prediction geometric area as the ordinate in the first barycentric coordinates.

Wherein the comparison value comprises a ratio, a difference value, a percentage and the like.

Specifically, the computer equipment obtains first gray values corresponding to each pixel point in the prediction geometric area, and constructs a gray value matrix according to the first gray values. The computer equipment calculates a first area of the prediction geometric area according to the first gray value corresponding to each pixel point. The computer device may sum the first gray values of all rows in the gray value matrix, calculate a contrast value between the sum of the first gray values of all rows and the first area of the prediction geometry, and take the contrast value as an abscissa in the first barycentric coordinates.

The computer device may sum the first gray values of all columns in the gray value matrix, calculate a contrast value between the sum of the first gray values of all columns and the first area of the prediction geometry, and take the contrast value as an ordinate in the first barycentric coordinate. And obtaining a first barycentric coordinate of the prediction geometric region according to the abscissa and the ordinate.

In one embodiment, the contrast value refers to the value. The computer device calculates a ratio between a sum of the first gray values of all rows in the gray value matrix and the first area of the prediction geometry, taking the ratio as abscissa. The computer device calculates a ratio between the sum of the first gray values of all columns in the gray value matrix and the first area of the prediction geometry, taking the ratio as ordinate. For example, the computer device may calculate the first barycentric coordinates according to the following formula:

Where x ₁ is the abscissa of the first barycentric coordinate and y ₁ is the ordinate of the first barycentric coordinate. i is a row in the gray value matrix and j is a column in the gray value matrix. Σ _i,jI_i,j (x) is the first region area of the prediction geometry region. I _i,j (x) is the gray value of the pixel point of the ith row and the jth column, i.e., the gray value of the ith row and the jth column.

In one embodiment, the contrast value is an index value. The computer device calculates a ratio between a sum of the first gray values of all rows in the gray value matrix and the first area of the prediction geometry, taking the ratio as abscissa. The computer device calculates a ratio between the sum of the first gray values of all columns in the gray value matrix and the first area of the prediction geometry, taking the ratio as ordinate.

In this embodiment, the barycenter coordinates of the prediction geometric area can be calculated based on the gray values of the pixels by using the comparison value of the sum of the first gray values of all rows and the area of the first region in the gray value matrix as the abscissa in the barycenter coordinates and the comparison value of the sum of the first gray values of all columns and the area of the first region as the ordinate.

As shown in fig. 7, a flow diagram of a test of a split network in one embodiment. The computer device may collect sample video including the target geometry under different circumstances and obtain sample images including the target geometry per frame from the sample video. And inputting the acquired sample images of each frame into a trained segmentation network by the computer so as to predict a predicted geometric area and predicted vertex information corresponding to the target geometric figure in the sample images of each frame, and testing the segmentation network through the predicted geometric area and the predicted vertex information.

As shown in fig. 8, a training architecture diagram of a split network in one embodiment.

And the computer equipment performs downsampling processing on the sample image to obtain a corresponding sample sampling image. The sample image and the sample image are input into a trained segmentation network, which may include segmentation branches and vertex prediction branches. The segmentation network to be trained carries out convolution processing on the sample image through a plurality of convolution layers of the segmentation branches, for example, 4 layers of convolution layers, so as to extract features and obtain a first feature map output by each convolution layer.

The segmentation network to be trained carries out convolution processing on the sample sampled image through a plurality of convolution layers of the vertex prediction branch, for example, 4 layers of convolution layers, so as to extract features and obtain a second feature map output by each convolution layer.

And carrying out 1*1 convolution processing on the first characteristic diagram output by the first convolution layer of the split branch, and carrying out up-sampling processing on the characteristic diagram after the convolution processing.

And starting from the second convolution layer of the segmentation branch, performing splicing processing on the first feature map of the second convolution layer and the second feature map of the first convolution layer of the vertex prediction branch to obtain a sample fusion feature map until the sample fusion feature map output by the last convolution layer of the segmentation branch is obtained. And performing 1*1 convolution processing on each sample fusion feature map, performing up-sampling processing on feature maps obtained after 1*1 convolution processing, and fusing each feature map obtained after up-sampling processing with the feature map obtained after up-sampling processing corresponding to the first convolution layer to obtain a prediction geometric region in the sample image.

And carrying out 1*1 convolution processing on the second characteristic diagram output by the last convolution layer of the vertex prediction branch, and outputting the characteristic diagram obtained by 1*1 convolution processing to a full-connection layer. And carrying out full connection processing on the feature images through the full connection layer to obtain the predicted vertex information corresponding to the target geometric figure in the sample sampling image.

And determining corresponding region color loss and region segmentation loss according to the prediction geometric region and the labeling geometric region. Corresponding geometric area loss and geometric center of gravity loss are determined from the predicted geometric area and the predicted polygonal area determined from the predicted vertex information.

A target loss function is constructed based on the region color loss, region segmentation loss, geometric area loss, and geometric center of gravity loss. Training the segmentation network to be trained through the target loss function until the training stopping condition is reached, and obtaining the trained target segmentation network.

For example, the objective loss function is: loss=loss ₀+Loss₁+Loss₂+Loss₃ (17)

Where Loss ₃ is the region color Loss, loss ₀ is the region segmentation Loss, loss ₁ is the geometric area Loss, and Loss ₂ is the geometric center of gravity Loss.

In one embodiment, the method further comprises: acquiring an image to be processed, and sampling the image to be processed to obtain a corresponding sampled image to be processed; respectively extracting features of an image to be processed and a sampling image to be processed through a trained target segmentation network to obtain a third feature image corresponding to the image to be processed and a fourth feature image corresponding to the sampling image to be processed; and carrying out fusion processing on the third feature map and the fourth feature map, and determining a target geometric figure in the image to be processed based on the target fusion feature map after the fusion processing.

Specifically, the computer equipment performs downsampling processing on the image to be processed to obtain a corresponding downsampled image to be processed. Or the computer equipment performs up-sampling processing on the image to be processed to obtain a corresponding up-sampled image to be processed.

In one embodiment, the computer device may sample the image to be processed through the trained segmentation network to obtain a corresponding sampled image to be processed. Further, up-sampling processing or down-sampling processing is performed on the image to be processed through the segmentation network, so that a corresponding up-sampling image to be processed or down-sampling image to be processed is obtained.

Specifically, the segmentation network performs multi-layer feature extraction on the image to be processed to obtain a third feature map corresponding to the image to be processed. Further, the segmentation network to be trained performs feature extraction on the image to be processed through the multi-layer convolution layers, and takes the third feature image output by the previous convolution layer as the input of the next convolution layer to obtain the third feature image output by each layer of convolution layer.

And carrying out multi-layer feature extraction on the sampled image to be processed by the segmentation network to obtain a fourth feature image corresponding to the sampled image to be processed. Further, the segmentation network performs feature extraction on the sampled image to be processed through the multi-layer convolution layers, and takes the fourth feature image output by the previous convolution layer as the input of the next convolution layer to obtain the fourth feature image output by each layer of convolution layer.

In one embodiment, the partitioning network includes two encoders, each of which includes multiple convolutional layers. The segmentation network inputs the image to be processed and the sampling image to be processed into two encoders respectively, and performs feature extraction on the image to be processed and the sampling image to be processed respectively to obtain a corresponding third feature map and a corresponding fourth feature map.

Specifically, the segmentation network performs fusion processing on the third feature map and the fourth feature map to obtain a target fusion feature map. And carrying out 1*1 convolution processing on the target fusion feature map by the segmentation network, and carrying out sampling processing on the feature map subjected to 1*1 convolution processing to obtain a target geometric figure in the image to be processed.

In this embodiment, the fusing processing of the third feature map and the fourth feature map may be to splice or superimpose the third feature map and the fourth feature map.

In this embodiment, the image to be processed is sampled and processed to obtain images with different sizes, and the images with different sizes are respectively subjected to feature extraction through the trained segmentation network, so that feature images with different resolutions are obtained, and the feature images with different resolutions are subjected to fusion processing, so that the target geometric figure in the image to be processed can be more accurately identified, and the accuracy of identifying and segmenting the target geometric figure can be improved.

In one embodiment, the method further comprises: determining the intersection ratio between the area of the target geometric figure and the area of the preset area; when the intersection ratio is greater than or equal to a threshold value, segmenting a target geometric figure from the image to be processed; and performing corresponding business processing based on the content included in the target geometry.

Specifically, the computer device may obtain gray values corresponding to each pixel point in the target geometry, and determine an area of the target geometry according to each gray value. The computer device may obtain the area of the preset region, calculate an intersection between the area of the target geometry and the area of the preset region, and a union between the area of the target geometry and the area of the preset region, thereby calculating a ratio of the intersection and the union, and obtaining an intersection ratio of the areas.

When the intersection ratio of the areas is greater than a threshold value, the computer device may segment the target geometry from the image to be processed. The computer device may obtain the content in the target geometry and perform corresponding business processes based on the content in the target geometry. The service process may be operations such as service data query, service data modification, service data update, service data deletion, etc., but is not limited thereto.

In this embodiment, when the intersection ratio between the area of the target geometry and the area of the preset area is greater than or equal to the threshold, it indicates that verification of the target geometry is successful, so that service handling is performed based on information in the target geometry after verification is successful, and the security of service handling can be improved.

In one embodiment, the method further comprises: acquiring an image to be processed, and extracting the characteristics of the image to be processed through a trained target segmentation network; and carrying out convolution and full connection processing on the extracted image features so as to output target vertex information corresponding to the target geometric figure in the image to be processed.

Specifically, the computer device acquires an image to be processed, and inputs the image to be processed into a trained segmentation network. The segmentation network extracts the characteristics of the image to be processed through the multi-layer convolution layers, takes the characteristic image output by the previous convolution layer as the input of the next convolution layer until the characteristic image output by the last convolution layer is obtained. And carrying out 1*1 convolution processing on the feature map output by the last convolution layer by the segmentation network, and outputting the feature map obtained by 1*1 convolution processing to the full connection layer. And carrying out full connection processing on the feature map through a full connection layer to obtain the predicted vertex information corresponding to the target geometric figure in the image to be processed.

In this embodiment, feature extraction is performed on an image to be processed based on a target segmentation network, and rolling and full-connection processing is performed on the extracted image features, so that target vertex information corresponding to a target geometric figure in the image to be processed can be accurately identified, the target geometric figure can be segmented from the image to be processed based on the target vertex information, and accuracy and precision of identification and segmentation are improved.

Determining a target polygon area formed by the coordinates of the target vertexes; determining the cross-over ratio between the area of the target polygonal area and the area of the preset area; when the intersection ratio is greater than or equal to a threshold value, a target polygon area formed by target vertex coordinates is segmented from the image to be processed; and performing corresponding business processing based on the target polygonal area.

Specifically, the computer device may calculate the area of the target polygon area from the target vertex coordinates. The computer device may obtain a preset area, calculate an intersection between the area of the target polygonal area and the preset area, and a union between the area of the target polygonal area and the preset area, thereby calculating a ratio of the intersection and the union, and obtaining an intersection ratio of the areas.

When the intersection ratio of the areas is greater than a threshold value, the computer device may segment a target polygon area formed by target vertex coordinates from the image to be processed. The computer device may obtain the content in the target polygonal area and perform corresponding business processes based on the content in the target polygonal area.

In this embodiment, when the intersection ratio between the area of the target polygonal area and the area of the preset area is greater than or equal to the threshold, it indicates that verification of the target polygonal area is successful, so that service handling is performed based on information in the target geometric figure after verification is successful, and the security of service handling can be improved.

In one embodiment, the sample image is an image including a target document, the target geometry is a quadrilateral formed based on a boundary of the target document, the predicted setpoint information corresponding to the target geometry includes vertex coordinates corresponding to each of four vertices of the target document, and the segmentation network is a document segmentation network.

Wherein the target document is a quadrilateral document for characterizing user information including, but not limited to, personal identification cards, residence cards, driver's licenses, passports.

Specifically, the computer device obtains a document image including a target document and determines a marked document region corresponding to the target document based on the document image. And carrying out target segmentation processing on the certificate image through a certificate segmentation network to be trained to obtain a predicted certificate area corresponding to the target certificate, and determining predicted vertex coordinates corresponding to the target certificate based on image characteristics in the target segmentation processing. And determining the corresponding regional characteristic loss according to the predicted certificate region and the marked certificate region. Corresponding geometric feature loss is determined from the predicted document region and the predicted polygon region determined by the predicted vertex coordinates. The computer device constructs an objective loss function based on the region feature loss and the geometric feature loss. Training the certificate segmentation network to be trained through the target loss function until the training stopping condition is reached, and obtaining a trained certificate segmentation network; the document segmentation network is used to segment a target document from a document image.

In this embodiment, the document image including the target document is subjected to the target segmentation processing by the document segmentation network to be trained, so that the predicted document region predicted by the document segmentation network and corresponding to the target document can be obtained. Based on the image characteristics in the target segmentation process, the predicted vertex coordinates of the target certificate predicted by the certificate segmentation network in the certificate image can be obtained. And constructing a target loss function according to the regional characteristic loss between the predicted certificate region and the marked certificate region and the geometric characteristic loss between the predicted certificate region and the predicted polygon region determined by the predicted vertex coordinates, so that the target loss function comprises multiple loss characteristics. The certificate segmentation network to be trained is trained based on various losses, and the influence of the losses in all aspects on the identification and segmentation of the certificate segmentation network can be fully considered, so that the accuracy and the precision of the identification and segmentation of the certificate segmentation network can be improved through training. The target certificate can be accurately identified and segmented from the image through the trained target certificate segmentation network. And the vertex coordinates of the target certificate in the image can be accurately output, so that the target certificate in the image can be accurately positioned.

As shown in fig. 9, an application scenario of the target split network in one embodiment is shown.

The front end collects the certificate video through an SDK (software development kit) and sends the certificate video to the background. The background uses a target segmentation network to segment target geometry from the document video and outputs target vertex coordinates of the target geometry in the document video. The background returns the target geometry and target vertex coordinates to the front end. And the front end calculates the cross-over ratio between the area of the target geometric figure and the area of the preset area, and when the cross-over ratio is larger than or equal to the threshold value, the front end judges that the verification of the target certificate is successful, and the user is allowed to perform corresponding business processing.

Or the front end determines a target polygonal area formed by the target vertex coordinates, calculates the cross-over ratio between the area of the target polygonal area and the area of the preset area, and when the cross-over ratio is greater than or equal to a threshold value, judges that the verification of the target polygonal area is successful, and allows the user to perform corresponding business processing.

In one embodiment, a training method for a split network is provided, including:

and step (S1), acquiring a sample image comprising the target geometric figure, and determining an annotation geometric region corresponding to the target geometric figure based on the sample image.

And step (S2), sampling the sample image to obtain a corresponding sample sampling image.

And step (S3), respectively carrying out feature extraction on the sample image and the sample sampling image through a segmentation network to be trained to obtain a first feature map corresponding to the sample image and a second feature map corresponding to the sample sampling image.

And step (S4), the first feature map and the second feature map are subjected to fusion processing, and a predicted geometric area corresponding to the target geometric figure is obtained based on the fused sample fusion feature map after the fusion processing.

And (S5) performing convolution and full connection processing on the second feature map to obtain predicted vertex information corresponding to the target geometric figure.

And step (S6), acquiring the values of the color channels corresponding to the pixel points in the prediction geometric region, and acquiring the values of the color channels corresponding to the pixel points in the labeling geometric region.

And step (S7), determining a first color value corresponding to the prediction geometric region according to the value of the color channel corresponding to each pixel point in the prediction geometric region, the first gray value corresponding to the corresponding pixel point and the first region area.

And step (S8), determining a second color value corresponding to the labeling geometric region according to the value of the color channel corresponding to each pixel point in the labeling geometric region, the second gray value corresponding to the corresponding pixel point and the second region area.

And (S9) determining the regional color loss between the prediction geometric region and the labeling geometric region based on the difference between the first color value and the second color value.

And step (S10), obtaining first gray values corresponding to the pixel points in the prediction geometric area, and taking the sum of the first gray values of the pixel points in the prediction geometric area as the first area of the prediction geometric area.

And step (S11), obtaining second gray values corresponding to the pixel points in the labeling geometric region, and taking the sum of the second gray values of the pixel points in the labeling geometric region as the second region area of the labeling geometric region.

A step (S12) of determining a region segmentation loss between the predicted geometric region and the labeling geometric region based on the first region area and the second region area; determining a first area of the predicted geometric region and calculating a third area of the predicted polygonal region determined by the predicted vertex information; the geometric area loss is determined from the difference between the first area and the third area.

And (S13) constructing a gray value matrix based on the first gray values corresponding to the pixel points in the prediction geometric region.

And (S14) taking the comparison value of the sum of the first gray values of each row in the gray value matrix and the first area of the prediction geometric area as the abscissa in the first barycentric coordinates of the prediction geometric area.

And (S15) taking the comparison value of the sum of the first gray values of each column in the gray value matrix and the first area of the prediction geometric area as the ordinate in the first barycentric coordinates.

A step (S16) of determining second center coordinates of the predicted polygon area formed by the predicted vertex coordinates based on the predicted vertex coordinates in the predicted vertex information; a geometric center of gravity loss between the predicted geometric region and the predicted polygon region is determined based on a distance between the first center of gravity coordinate and the second center of gravity coordinate.

And (S17) constructing a target loss function based on the region color loss, the region segmentation loss, the geometric area loss and the geometric center of gravity loss.

And step (S18), training the segmented network to be trained through the target loss function until the training stopping condition is reached, and obtaining the trained target segmented network.

And step (S19), acquiring an image to be processed, and sampling the image to be processed to obtain a corresponding sampled image to be processed.

And step (S20), respectively carrying out feature extraction on the image to be processed and the sample image to be processed through the trained target segmentation network to obtain a third feature map corresponding to the image to be processed and a fourth feature map corresponding to the sample image to be processed.

And step (S21), fusing the third feature map and the fourth feature map, and determining a target geometric figure in the image to be processed based on the fused target fusion feature map.

And step (S22), performing convolution and full connection processing on the fourth feature map to output target vertex information corresponding to the target geometric figure in the image to be processed.

Step (S23), determining the intersection ratio between the area of the target geometric figure and the area of the preset area; and when the intersection ratio is greater than or equal to the threshold value, dividing a target geometric figure from the image to be processed, and carrying out corresponding business processing based on the content included in the target geometric figure.

In this embodiment, the target segmentation processing is performed on the sample image including the target geometry through the segmentation network to be trained, so that the predicted geometry region predicted by the segmentation network and corresponding to the target geometry can be obtained. And determining the color loss of the region based on the color difference between the predicted geometric region and the marked geometric region, and determining the color loss between the geometric region predicted by the segmentation network and the real geometric region. Based on the area of the predicted geometric region and the area of the marked geometric region, the segmentation difference between the geometric region and the real geometric region which is predicted by the segmentation network can be determined.

The geometric area loss is determined based on the area difference between the predicted geometric area and the predicted polygonal area, so that the area loss between the geometric area predicted by the segmentation network and the polygonal area can be determined, and the large area loss indicates inaccurate prediction and segmentation of the segmentation network. Based on the barycenter position of the predicted geometric region and the barycenter position of the predicted polygonal region, a barycenter loss between the geometric region predicted by the segmentation network and the predicted polygonal region can be determined, and a large barycenter loss indicates inaccurate prediction and segmentation of the segmentation network.

The color loss, the segmentation loss, the area loss and the gravity center loss are used as conditions for training the segmentation network, the segmentation network to be trained is trained based on multiple aspects of loss, and the influence of the loss on the segmentation network identification and segmentation can be fully considered, so that the segmentation network identification and segmentation accuracy and precision can be improved through training. The target geometric figure can be accurately identified and segmented from the image through the trained target segmentation network. And the vertex information of the target geometric figure in the image can be accurately output, so that the target geometric figure in the image to be processed can be accurately positioned.

The application also provides an application scene, which applies the training method of the certificate segmentation network. Specifically, the application of the training method of the certificate segmentation network in the application scene is as follows:

The computer device obtains a sample document image including the target document and determines a labeling document region corresponding to the target document based on the sample document image. And the computer equipment samples the sample certificate image to obtain a corresponding sample sampling image. And the computer equipment respectively performs feature extraction on the sample certificate image and the sample sampling image through a certificate segmentation network to be trained to obtain a first feature map corresponding to the sample certificate image and a second feature map corresponding to the sample sampling image.

And the computer equipment performs fusion processing on the first feature map and the second feature map, and obtains a predicted certificate area corresponding to the target certificate based on the fused feature map.

The computer device performs convolution and full connection processing on the second feature map to obtain predicted vertex coordinates [ P ₁₁,P₁₂],[P₂₁,P₂₂],[P₃₁,P₃₂],[P₄₁,P₄₂ ] corresponding to the target document.

The computer equipment obtains the values of the color channels corresponding to the pixel points in the predicted certificate area respectively, and obtains the values of the color channels corresponding to the pixel points in the marked certificate area respectively. Substituting the value of the color channel corresponding to each pixel point in the predicted document area, the first gray value corresponding to the corresponding pixel point and the first area into the following formula (3), and calculating a first color value C corresponding to the predicted document area:

Similarly, a second color value C ₀ corresponding to the labeling document region is determined according to the value of the color channel corresponding to each pixel in the labeling document region, the second gray value corresponding to the corresponding pixel, and the second region area.

Substituting the first color value C and the second color value C ₀ into the formula (1) to obtain the regional color Loss Loss ₃ between the predicted and annotated document regions:

Loss₃＝|C₀-C| (1)

acquiring first gray values corresponding to each pixel point in a predicted document area, and taking the sum of the first gray values of each pixel point in the predicted document area as the first area of the predicted document area, namely:

S₁＝∑_i,jI_i,j(x) (11)

And obtaining second gray values corresponding to the pixel points in the labeling document area, and taking the sum of the second gray values of the pixel points in the labeling document area as a second area sigma _i,jGT_i,j (x) of the labeling document area.

Based on the first area Σ _i,jI_i,j (x) and the second area Σ _i,jGT_i,j (x), the area division Loss ₀ between the predicted document area and the labeling document area is calculated, namely:

Determining a first area of the predicted document area and calculating a third area S ₂ of the predicted polygon area determined by the predicted vertex information, namely:

From the first area S ₁ and the third area S ₂, the document area Loss ₁ is calculated:

Loss₁＝|S₁-S₂| (13)

And constructing a gray value matrix based on the first gray values respectively corresponding to the pixel points in the predicted certificate area. The sum of the first gray values of each row in the gray value matrix and the comparison value of the first area of the predicted document area are taken as the abscissa x ₁ in the first barycentric coordinates (x ₁,y₁) of the predicted document area. And taking the comparison value of the sum of the first gray values of each column in the gray value matrix and the first area of the predicted document area as an ordinate y ₁ in the first barycentric coordinate, wherein the comparison value is as follows:

based on the predicted vertex coordinates in the predicted vertex information, second center coordinates (x ₂,y₂) of the predicted polygon area formed by the predicted vertex coordinates are determined:

Based on the distance between the first center of gravity coordinate (x ₁,y₁) and the second center of gravity coordinate (x ₂,y₂), a document center of gravity Loss ₂ between the predicted document region and the predicted polygon region is determined:

Loss₂＝(x₁-x₂)²+(y₁-y₂)² (15)

Based on the regional color Loss ₃, the regional division Loss ₀ document area Loss ₁, document center of gravity Loss ₂, constructing a target Loss function Loss:

Loss＝Loss₀+Loss₁+Loss₂+Loss₃ (17)

Training the certificate segmentation network to be trained through the target loss function until the loss value of the certificate segmentation network is smaller than or equal to the loss threshold value, and obtaining the trained target certificate segmentation network.

When the user needs to transact the related business of the bank, the front end of the computer equipment shoots the identity document image of the user and transmits the acquired identity document image to the background.

The background performs target segmentation processing through the trained target certificate segmentation network to obtain an identity card region in the identity card image, and can also output 4 vertex coordinates of the identity card in the identity card image. And the background returns the identity card area and the vertex coordinates to the front end.

The front end calculates the cross-over ratio between the area of the target certificate and the area of the preset area, and when the cross-over ratio is greater than or equal to a threshold value, the user is allowed to transact banking related to individuals, such as opening a banking card, inquiring, modifying banking reservation information and the like.

Or the front end calculates the intersection ratio between the area of the quadrilateral area formed by the 4 vertex coordinates and the area of the preset area, and when the intersection ratio is greater than or equal to a threshold value, the user is allowed to conduct personal related banking business, such as opening a bank card, inquiring, modifying bank reservation information and the like.

It should be understood that, although the steps in the flowcharts of fig. 2-9 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps of fig. 2-9 may include multiple steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the steps or stages are performed necessarily occur sequentially, but may be performed alternately or alternately with at least a portion of the steps or stages in other steps or other steps.

In one embodiment, as shown in fig. 10, a training apparatus for a split network is provided, where the training apparatus may use a software module or a hardware module, or a combination of the two, and forms a part of a computer device, and the training apparatus 1000 for a split network specifically includes: an acquisition module 1002, a prediction module 1004, a region feature loss determination module 1006, a geometric feature loss determination module 1008, a construction module 1010, and a training module 1012, wherein:

an obtaining module 1002 is configured to obtain a sample image including a target geometry, and determine a labeling geometric region corresponding to the target geometry based on the sample image.

The prediction module 1004 is configured to perform target segmentation processing on the sample image through a segmentation network to be trained, obtain a predicted geometric area corresponding to the target geometry, and determine predicted vertex information corresponding to the target geometry based on image features in the target segmentation processing.

The region feature loss determination module 1006 is configured to determine a corresponding region feature loss according to the predicted geometric region and the labeled geometric region.

The geometric feature loss determination module 1008 is configured to determine a corresponding geometric feature loss according to the predicted geometric region and the predicted polygon region determined by the predicted vertex information.

A construction module 1010 is configured to construct an objective loss function based on the region feature loss and the geometric feature loss.

The training module 1012 is configured to train the segmented network to be trained through the target loss function, and stop until reaching a training stop condition, so as to obtain a trained target segmented network; the object segmentation network is used for segmenting an object geometric figure from the image to be processed.

In this embodiment, the target segmentation processing is performed on the sample image including the target geometry through the segmentation network to be trained, so that the predicted geometry region predicted by the segmentation network and corresponding to the target geometry can be obtained. Based on the image features in the target segmentation process, the predicted vertex information of the predicted target geometry predicted by the segmentation network in the sample image can be obtained. And constructing a target loss function according to the regional characteristic loss between the predicted geometric region and the labeling geometric region and the geometric characteristic loss between the predicted geometric region and the predicted polygonal region determined by the predicted vertex information, so that the target loss function comprises multiple loss characteristics. The segmentation network to be trained is trained based on the loss in multiple aspects, and the influence of the loss in all aspects on the recognition and segmentation of the segmentation network can be fully considered, so that the recognition and segmentation accuracy of the segmentation network can be improved through training. The target geometric figure can be accurately identified and segmented from the image through the trained target segmentation network. And the vertex information of the target geometric figure in the image can be accurately output, so that the target geometric figure in the image to be processed can be accurately positioned.

In one embodiment, the prediction module 1004 is further configured to: sampling the sample image to obtain a corresponding sample sampling image; respectively carrying out feature extraction on a sample image and a sample sampling image through a segmentation network to be trained to obtain a first feature image corresponding to the sample image and a second feature image corresponding to the sample sampling image; carrying out fusion processing on the first feature map and the second feature map, and obtaining a predicted geometric area corresponding to the target geometric figure based on the fused sample fusion feature map; and carrying out convolution and full connection processing on the second feature map to obtain predicted vertex information corresponding to the target geometric figure.

In one embodiment, the regional characteristic loss determination module 1006 is further configured to: determining a first color value corresponding to the prediction geometric region and a second color value corresponding to the labeling geometric region; determining a region color loss between the predicted geometric region and the labeling geometric region based on the difference of the first color value and the second color value; determining a first area corresponding to the predicted geometric area and a second area corresponding to the marked geometric area; determining a region segmentation loss between the predicted geometric region and the labeling geometric region based on the first region area and the second region area; and determining the corresponding regional characteristic loss according to the regional color loss and the regional segmentation loss.

In one embodiment, the regional characteristic loss determination module 1006 is further configured to: acquiring the values of color channels corresponding to the pixel points in the prediction geometric area respectively, and acquiring the values of color channels corresponding to the pixel points in the labeling geometric area respectively; determining a first color value corresponding to the prediction geometric region according to the value of the color channel corresponding to each pixel point in the prediction geometric region, a first gray value corresponding to the corresponding pixel point and the first region area; and determining a second color value corresponding to the labeling geometric region according to the value of the color channel corresponding to each pixel point in the labeling geometric region, the second gray value corresponding to the corresponding pixel point and the second region area.

In one embodiment, the regional characteristic loss determination module 1006 is further configured to: acquiring a first gray value corresponding to each pixel point in the prediction geometric area, and acquiring a second gray value corresponding to each pixel point in the labeling geometric area; taking the sum of the first gray values of all pixel points in the prediction geometric area as a first area of the prediction geometric area; and taking the sum of the second gray values of the pixel points in the labeling geometric area as the second area of the labeling geometric area.

In one embodiment, the geometric feature loss determination module 1008 is further to: determining a first area of the predicted geometric region and calculating a third area of the predicted polygonal region determined by the predicted vertex information; determining a geometric area loss according to the difference between the first area and the third area; determining a first gravity center position corresponding to the predicted geometric region and a second gravity center position corresponding to the predicted polygonal region; determining a geometric center of gravity loss according to a distance between the first center of gravity position and the second center of gravity position; the geometric feature loss is determined based on the geometric area loss and the geometric center of gravity loss.

In one embodiment, the first center of gravity position comprises a first center of gravity coordinate and the second center of gravity position comprises a second center of gravity coordinate; the geometric feature loss determination module 1008: determining a first barycentric coordinate of the prediction geometric region according to first gray values respectively corresponding to all pixel points in the prediction geometric region; determining second center coordinates of a predicted polygon area formed by the predicted vertex coordinates based on the predicted vertex coordinates in the predicted vertex information; a geometric center of gravity loss between the predicted geometric region and the predicted polygon region is determined based on a distance between the first center of gravity coordinate and the second center of gravity coordinate.

In one embodiment, the geometric feature loss determination module 1008 is further to: constructing a gray value matrix based on the first gray values respectively corresponding to the pixel points in the prediction geometric region; taking the contrast value of the sum of the first gray values of each row in the gray value matrix and the first area of the prediction geometric area as the abscissa in the first barycentric coordinates of the prediction geometric area; and taking the comparison value of the sum of the first gray values of each column in the gray value matrix and the first area of the prediction geometric area as the ordinate in the first barycentric coordinates.

In one embodiment, the apparatus further comprises: and a segmentation module. The segmentation module is also used for acquiring an image to be processed, and sampling the image to be processed to obtain a corresponding sampled image to be processed; respectively extracting features of an image to be processed and a sampling image to be processed through a trained target segmentation network to obtain a third feature image corresponding to the image to be processed and a fourth feature image corresponding to the sampling image to be processed; and carrying out fusion processing on the third feature map and the fourth feature map, and determining a target geometric figure in the image to be processed based on the target fusion feature map after the fusion processing.

In one embodiment, the apparatus further comprises: and a business processing module. The service processing module is further configured to:

determining the intersection ratio between the area of the target geometric figure and the area of the preset area; when the intersection ratio is greater than or equal to a threshold value, segmenting a target geometric figure from the image to be processed; and performing corresponding business processing based on the content included in the target geometry.

In one embodiment, the apparatus further comprises: and a vertex prediction module. The vertex prediction module is also configured to: acquiring an image to be processed, and extracting the characteristics of the image to be processed through a trained target segmentation network; and carrying out convolution and full connection processing on the extracted image features so as to output target vertex information corresponding to the target geometric figure in the image to be processed.

In one embodiment, the target vertex information includes target vertex coordinates; the apparatus further comprises: and a business processing module. The service processing module is further configured to: determining a target polygon area formed by the coordinates of the target vertexes; determining the cross-over ratio between the area of the target polygonal area and the area of the preset area; when the intersection ratio is larger than or equal to a second threshold value, dividing a target polygon area formed by target vertex coordinates from the image to be processed; and performing corresponding business processing based on the target polygonal area.

In one embodiment, the sample image is an image including a target document, the target geometry is a quadrilateral formed based on a boundary of the target document, the predicted setpoint information corresponding to the target geometry includes vertex coordinates corresponding to each of four vertices of the target document, and the target segmentation network is a document segmentation network.

For specific limitations on the training apparatus of the split network, reference may be made to the above limitation on the training method of the split network, and no further description is given here. The above-described respective modules in the training apparatus of the split network may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 11. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing training data of the segmentation network. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a training method for a split network.

It will be appreciated by those skilled in the art that the structure shown in FIG. 11 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, storing a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the steps in the above-described method embodiments.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. A method of training a segmentation network, the method comprising:

Sampling the sample image to obtain a corresponding sample sampled image;

Respectively extracting the characteristics of the sample image and the sample sampling image through a segmentation network to be trained to obtain a first characteristic image corresponding to the sample image and a second characteristic image corresponding to the sample sampling image;

Performing fusion processing on the first feature map and the second feature map, and obtaining a predicted geometric area corresponding to the target geometric figure based on the fused sample fusion feature map;

Performing convolution and full connection processing on the second feature map to obtain predicted vertex information corresponding to the target geometric figure;

2. The method of claim 1, wherein said determining a corresponding region feature loss from the predicted geometric region and the labeled geometric region comprises:

Determining a first color value corresponding to the prediction geometric region and a second color value corresponding to the labeling geometric region;

determining a region color loss between the predicted geometric region and the labeling geometric region based on a difference of the first color value and the second color value;

Determining a first area corresponding to the predicted geometric area and a second area corresponding to the labeling geometric area;

Determining a region segmentation loss between the predicted geometric region and the labeling geometric region based on the first region area and the second region area;

and determining corresponding regional characteristic loss according to the regional color loss and the regional segmentation loss.

3. The method of claim 2, wherein determining the first color value corresponding to the predicted geometric region and the second color value corresponding to the labeled geometric region comprises:

acquiring the values of color channels corresponding to the pixel points in the prediction geometric area respectively, and acquiring the values of color channels corresponding to the pixel points in the labeling geometric area respectively;

Determining a first color value corresponding to the prediction geometric region according to the value of the color channel corresponding to each pixel point in the prediction geometric region, a first gray value corresponding to the corresponding pixel point and the first region area;

And determining a second color value corresponding to the labeling geometric region according to the value of the color channel corresponding to each pixel point in the labeling geometric region, the second gray value corresponding to the corresponding pixel point and the second region area.

4. The method of claim 2, wherein determining a first region area corresponding to the predicted geometric region and a second region area corresponding to the labeled geometric region comprises:

Acquiring first gray values corresponding to each pixel point in the prediction geometric area, and acquiring second gray values corresponding to each pixel point in the labeling geometric area;

Taking the sum of the first gray values of the pixel points in the prediction geometric area as a first area of the prediction geometric area;

and taking the sum of the second gray values of the pixel points in the labeling geometric area as the second area of the labeling geometric area.

5. The method of claim 1, wherein said determining a corresponding geometric feature loss from the predicted geometric region and a predicted polygon region determined from the predicted vertex information comprises:

Determining a first area of the predicted geometric region and calculating a third area of the predicted polygonal region determined by the predicted vertex information;

determining a geometric area loss according to the difference between the first area and the third area;

determining a first gravity center position corresponding to the predicted geometric region and a second gravity center position corresponding to the predicted polygonal region;

Determining a geometric center of gravity loss according to a distance between the first center of gravity position and the second center of gravity position;

A geometric feature loss is determined based on the geometric area loss and the geometric center of gravity loss.

6. The method of claim 5, wherein the first center of gravity position comprises a first center of gravity coordinate and the second center of gravity position comprises a second center of gravity coordinate; the determining the geometric center of gravity loss according to the distance between the first center of gravity position and the second center of gravity position comprises:

Determining a first barycentric coordinate of the prediction geometric region according to first gray values respectively corresponding to all pixel points in the prediction geometric region;

determining second center coordinates of a predicted polygon area formed by the predicted vertex coordinates based on the predicted vertex coordinates in the predicted vertex information;

the determining the geometric center of gravity loss according to the distance between the first center of gravity position and the second center of gravity position comprises:

a geometric center of gravity loss between the predicted geometric region and the predicted polygon region is determined based on a distance between the first center of gravity coordinate and the second center of gravity coordinate.

7. The method of claim 6, wherein determining the first barycentric coordinates of the predicted geometric region according to the first gray values respectively corresponding to the pixels in the predicted geometric region comprises:

Constructing a gray value matrix based on the first gray values respectively corresponding to the pixel points in the prediction geometric region;

Taking the comparison value of the sum of the first gray values of each row in the gray value matrix and the first area of the prediction geometric area as the abscissa in the first barycentric coordinates of the prediction geometric area;

And taking the comparison value of the sum of the first gray values of each column in the gray value matrix and the first area of the prediction geometric area as the ordinate of the first barycentric coordinates.

8. The method according to claim 1, wherein the method further comprises:

Acquiring an image to be processed, and sampling the image to be processed to obtain a corresponding sampled image to be processed;

Respectively extracting the characteristics of the image to be processed and the sampled image to be processed through the trained target segmentation network to obtain a third characteristic image corresponding to the image to be processed and a fourth characteristic image corresponding to the sampled image to be processed;

And carrying out fusion processing on the third feature map and the fourth feature map, and determining a target geometric figure in the image to be processed based on the target fusion feature map after the fusion processing.

9. The method of claim 8, wherein the method further comprises:

determining the intersection ratio between the area of the target geometric figure and the area of a preset area;

when the intersection ratio is greater than or equal to a threshold value, the target geometric figure is segmented from the image to be processed;

and carrying out corresponding business processing based on the content included in the target geometric figure.

10. The method according to claim 1, wherein the method further comprises:

Acquiring an image to be processed, and extracting the characteristics of the image to be processed through the trained target segmentation network;

And carrying out convolution and full connection processing on the extracted image features so as to output target vertex information corresponding to the target geometric figure in the image to be processed.

11. The method according to any one of claims 1 to 10, wherein the sample image is an image containing a target document, the target geometry is a quadrilateral formed based on a boundary of the target document, the predicted setpoint information corresponding to the target geometry includes vertex coordinates corresponding to each of four vertices of the target document, and the target segmentation network is a document segmentation network.

12. A training apparatus for a split network, the apparatus comprising:

The prediction module is used for sampling the sample image to obtain a corresponding sample sampling image; respectively extracting the characteristics of the sample image and the sample sampling image through a segmentation network to be trained to obtain a first characteristic image corresponding to the sample image and a second characteristic image corresponding to the sample sampling image; performing fusion processing on the first feature map and the second feature map, and obtaining a predicted geometric area corresponding to the target geometric figure based on the fused sample fusion feature map; performing convolution and full connection processing on the second feature map to obtain predicted vertex information corresponding to the target geometric figure;

13. The apparatus of claim 12, wherein the region feature loss determination module is further configured to determine a first color value corresponding to the predicted geometric region and a second color value corresponding to the labeled geometric region; determining a region color loss between the predicted geometric region and the labeling geometric region based on a difference of the first color value and the second color value; determining a first area corresponding to the predicted geometric area and a second area corresponding to the labeling geometric area; determining a region segmentation loss between the predicted geometric region and the labeling geometric region based on the first region area and the second region area; and determining corresponding regional characteristic loss according to the regional color loss and the regional segmentation loss.

14. The apparatus of claim 13, wherein the region feature loss determination module is further configured to obtain a value of a color channel corresponding to each pixel point in the prediction geometry region, and obtain a value of a color channel corresponding to each pixel point in the labeling geometry region; determining a first color value corresponding to the prediction geometric region according to the value of the color channel corresponding to each pixel point in the prediction geometric region, a first gray value corresponding to the corresponding pixel point and the first region area; and determining a second color value corresponding to the labeling geometric region according to the value of the color channel corresponding to each pixel point in the labeling geometric region, the second gray value corresponding to the corresponding pixel point and the second region area.

15. The apparatus of claim 13, wherein the region feature loss determination module is further configured to obtain a first gray value corresponding to each pixel point in the prediction geometry region, and obtain a second gray value corresponding to each pixel point in the labeling geometry region; taking the sum of the first gray values of the pixel points in the prediction geometric area as a first area of the prediction geometric area; and taking the sum of the second gray values of the pixel points in the labeling geometric area as the second area of the labeling geometric area.

16. The apparatus of claim 12, wherein the geometric feature loss determination module is further configured to determine a first region area of the predicted geometric region and calculate a third region area of the predicted polygonal region determined by the predicted vertex information; determining a geometric area loss according to the difference between the first area and the third area; determining a first gravity center position corresponding to the predicted geometric region and a second gravity center position corresponding to the predicted polygonal region; determining a geometric center of gravity loss according to a distance between the first center of gravity position and the second center of gravity position; a geometric feature loss is determined based on the geometric area loss and the geometric center of gravity loss.

17. The apparatus of claim 16, wherein the first center of gravity position comprises a first center of gravity coordinate and the second center of gravity position comprises a second center of gravity coordinate; the geometric feature loss determining module is further configured to determine a first barycentric coordinate of the predicted geometric region according to first gray values corresponding to each pixel point in the predicted geometric region; determining second center coordinates of a predicted polygon area formed by the predicted vertex coordinates based on the predicted vertex coordinates in the predicted vertex information; a geometric center of gravity loss between the predicted geometric region and the predicted polygon region is determined based on a distance between the first center of gravity coordinate and the second center of gravity coordinate.

18. The apparatus of claim 17, wherein the geometric feature loss determination module is further configured to construct a gray value matrix based on the first gray values respectively corresponding to the pixels in the predicted geometric region; taking the comparison value of the sum of the first gray values of each row in the gray value matrix and the first area of the prediction geometric area as the abscissa in the first barycentric coordinates of the prediction geometric area; and taking the comparison value of the sum of the first gray values of each column in the gray value matrix and the first area of the prediction geometric area as the ordinate of the first barycentric coordinates.

19. The apparatus of claim 12, further comprising a segmentation module; the segmentation module is used for acquiring an image to be processed, and carrying out sampling processing on the image to be processed to obtain a corresponding sampled image to be processed; respectively extracting the characteristics of the image to be processed and the sampled image to be processed through the trained target segmentation network to obtain a third characteristic image corresponding to the image to be processed and a fourth characteristic image corresponding to the sampled image to be processed; and carrying out fusion processing on the third feature map and the fourth feature map, and determining a target geometric figure in the image to be processed based on the target fusion feature map after the fusion processing.

20. The apparatus of claim 12, wherein the apparatus further comprises a traffic processing module; the business processing module is used for determining the cross-over ratio between the area of the target geometric figure and the area of the preset area; when the intersection ratio is greater than or equal to a threshold value, the target geometric figure is segmented from the image to be processed; and carrying out corresponding business processing based on the content included in the target geometric figure.

21. The apparatus of claim 12, further comprising a vertex prediction module; the vertex predicting module is used for acquiring an image to be processed and extracting the characteristics of the image to be processed through the trained target segmentation network; and carrying out convolution and full connection processing on the extracted image features so as to output target vertex information corresponding to the target geometric figure in the image to be processed.

22. The apparatus according to any one of claims 12 to 21, wherein the sample image is an image including a target document, the target geometry is a quadrilateral formed based on a boundary of the target document, the predicted setpoint information corresponding to the target geometry includes vertex coordinates corresponding to each of four vertices of the target document, and the target segmentation network is a document segmentation network.

23. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 11 when the computer program is executed.

24. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 11.

25. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the method of any one of claims 1 to 11.