CN113506358A

CN113506358A - Image annotation method, device, equipment and storage medium

Info

Publication number: CN113506358A
Application number: CN202110799731.7A
Authority: CN
Inventors: 王秋思; 潘柳华; 徐麟
Original assignee: Shanghai Eye Control Technology Co Ltd
Current assignee: Shanghai Eye Control Technology Co Ltd
Priority date: 2021-07-15
Filing date: 2021-07-15
Publication date: 2021-10-15

Abstract

The invention discloses an image labeling method, an image labeling device, image labeling equipment and a storage medium, wherein the method comprises the following steps: determining the prediction information of each point in the image to be supplemented and labeled according to the image to be supplemented and the description information of the region to be supplemented and labeled in the image to be supplemented and labeled, and determining the category point to be supplemented of the category to be supplemented according to each prediction information; determining a connected domain formed by each category point to be supplemented as a region to be supplemented with marks; and supplementing the marked area to be supplemented to the corresponding position in the original marked image to obtain the target marked image. According to the technical scheme, the prediction information of each point in the image to be annotated is determined, the category point to be supplemented is determined according to the prediction information, the connected domain formed by the category points to be supplemented is determined as the annotated area to be supplemented, the annotated area to be supplemented is supplemented to the corresponding position in the original annotated image, the target annotated image is obtained, the complete annotation of the original annotated image is realized, the annotation speed is improved, and the annotation time is further saved.

Description

Image annotation method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to an image processing technology, in particular to an image annotation method, an image annotation device, image annotation equipment and a storage medium.

Background

The image segmentation is an important and commonly used tool in image processing, is often used in vehicle annual inspection, greatly assists the detection process, but the quality of the image segmentation effect depends on the image quality, the image labeling effect and the network.

Because the image labeling needs to label along the edge of each type of image, the task amount is large. Once part of the annotation effect is poor or the annotation needs to be supplemented in the image annotation task, the image needs to be annotated again in the prior art, and the time consumption of the annotation process is long.

Disclosure of Invention

The invention provides an image annotation method, device, equipment and storage medium, and aims to improve the efficiency of image supplementary annotation.

In a first aspect, an embodiment of the present invention provides an image annotation method, including:

determining the prediction information of each point in the image to be supplemented and labeled according to the image to be supplemented and labeled and the description information of the region to be supplemented and labeled in the image to be supplemented and labeled, and determining the category point to be supplemented according to each prediction information;

determining a connected domain formed by the category points to be supplemented as the region to be supplemented;

and supplementing the marking area to be supplemented to the corresponding position in the original marking image to obtain the target marking image.

Further, determining the prediction information of each point in the image to be supplemented and annotated according to the image to be supplemented and the description information of the region to be supplemented and annotated in the image to be supplemented and annotated, comprising:

and inputting the description information of the to-be-supplemented annotation image and the to-be-supplemented annotation region in the to-be-supplemented annotation image as input information into a preset annotation model, and obtaining output information as prediction information of each point in the to-be-supplemented annotation image.

Furthermore, the preset labeling module is composed of a convolution nerve module, a spatial pyramid pooling module, a text processing module, a bilinear fusion module and an attention mechanism module which are connected with one another.

inputting the image to be supplemented and labeled into the convolution nerve module to obtain the characteristic information of the image to be supplemented and labeled;

inputting the characteristic information into the spatial pyramid pooling module to obtain the visual characteristic of the image to be supplemented and labeled;

inputting the description information of the region to be supplemented and labeled in the label to be supplemented into the text processing module to obtain the text characteristics of the region to be supplemented and labeled;

fusing the visual features and the text features based on the bilinear fusion module to obtain multi-modal features;

and inputting the multi-modal characteristics into the attention mechanism module to obtain the prediction information of each point in the image to be supplemented and annotated.

Further, the prediction information includes a prediction score, and the determining of the category point to be supplemented according to each piece of prediction information includes:

comparing each prediction score with a preset threshold;

and if the prediction score is larger than the preset threshold, determining the point corresponding to the prediction score as the category point to be supplemented.

Further, determining a connected domain formed by each category point to be supplemented as the region to be supplemented with annotations includes:

forming at least one connected domain by each category point to be supplemented;

traversing each category point to be supplemented to screen each connected domain to obtain a target connected domain;

and determining the target connected domain as the to-be-supplemented marking region.

Further, supplementing the to-be-supplemented annotation region to a corresponding position in the original annotation image to obtain a target annotation image, including:

selecting a target supplementary category point from the to-be-supplemented labeling area according to a preset labeling requirement;

supplementing the target supplement type points to corresponding positions in the original annotation image, and meanwhile determining supplement gray values of all points in the to-be-supplemented annotation area;

and determining the gray value of the target area corresponding to the area to be supplemented and marked in the original marked image as the supplemented gray value to obtain the target marked image.

In a second aspect, an embodiment of the present invention further provides an image annotation apparatus, including:

the device comprises a to-be-supplemented category point determining module, a to-be-supplemented annotation processing module and a to-be-supplemented annotation processing module, wherein the to-be-supplemented annotation processing module is used for determining prediction information of each point in an to-be-supplemented annotation image according to the to-be-supplemented annotation image and description information of a to-be-supplemented annotation area in the to-be-supplemented annotation image, and determining to-be-supplemented category points according to the prediction information;

a to-be-supplemented labeling area determining module, configured to determine a connected domain formed by each to-be-supplemented category point as the to-be-supplemented labeling area;

and the supplementing module is used for supplementing the to-be-supplemented labeling area to a corresponding position in the original labeling image to obtain a target labeling image.

In a third aspect, an embodiment of the present invention further provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the image annotation method according to any one of the first aspect when executing the program.

In a fourth aspect, the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are used for executing the image annotation method according to any one of the first aspect.

The embodiment of the invention provides an image labeling method, an image labeling device, image labeling equipment and a storage medium, wherein the method comprises the following steps: determining the prediction information of each point in the image to be supplemented and labeled according to the image to be supplemented and labeled and the description information of the region to be supplemented and labeled in the image to be supplemented and labeled, and determining the category point to be supplemented according to each prediction information; determining a connected domain formed by the category points to be supplemented as the region to be supplemented; and supplementing the marking area to be supplemented to the corresponding position in the original marking image to obtain the target marking image. According to the technical scheme, according to the image to be supplemented and the description information of the region to be supplemented and annotated in the image to be supplemented and annotated, the prediction information of each point in the image to be supplemented and annotated can be determined, the category point to be supplemented can be determined according to the prediction information, then the connected domain formed by the category points to be supplemented and determined as the region to be supplemented and annotated, the region to be supplemented and annotated can be supplemented and entered into the corresponding position in the original annotated image, the supplementary annotation of the original annotated image is realized, so that the target annotated image is obtained, the complete annotation of the image is realized, the image annotation speed is improved, and the image annotation time is further saved.

Drawings

FIG. 1 is a frame number image and a labeled image obtained by frame number segmentation according to an embodiment of the present invention;

fig. 2 is a flowchart of an image annotation method according to an embodiment of the present invention;

fig. 3 is a flowchart of an image annotation method according to a second embodiment of the present invention;

fig. 4 is a schematic flow chart illustrating a process of determining a category point to be supplemented by a preset annotation model in an image annotation method according to a second embodiment of the present invention;

fig. 5 is a schematic diagram of a preset annotation model in an image annotation method according to a second embodiment of the present invention;

FIG. 6 is a flowchart illustrating a step 3130 in an image annotation method according to a second embodiment of the present invention;

fig. 7 is a schematic view of a frame number start and stop supplementary labeling in an image labeling method according to a second embodiment of the present invention;

FIG. 8 is a flowchart illustrating another method for image annotation according to a second embodiment of the present invention;

fig. 9 is a schematic structural diagram of an image annotation apparatus according to a third embodiment of the present invention;

fig. 10 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like. In addition, the embodiments and features of the embodiments in the present invention may be combined with each other without conflict.

Fig. 1 is a frame number image and a labeled image obtained by dividing a frame number according to an embodiment of the present invention, and as shown in fig. 1, when a frame number is divided, only 17-bit frame number characters including letters and numbers are labeled in the frame number division in the early stage of an engineering, but after the image labeling is completed, a start-stop character of a head and a tail of a frame number is divided into a certain type of characters, for example, is divided into "a star" which is not standard. The image labeling effect is poor, if the start stop sign is added, a large amount of manpower and material resources are consumed to label each image again, and the time consumed by the labeling project is long. In addition, the problem that the labeling is required after the labeling is finished also often occurs in other image segmentation data processing.

The embodiment of the invention provides an image annotation method, which aims to realize complete annotation of an image, improve the image annotation speed and further save the image annotation time. An image annotation method can be realized by the following embodiments.

Example one

Fig. 2 is a flowchart of an image annotation method according to an embodiment of the present invention, which is applicable to a situation where an image annotation effect is not good or an image needs to be additionally annotated, where the method can be executed by an image annotation device, as a preferred implementation manner, the entire patent document is described by taking frame number segmentation and additional annotation as an example, as shown in fig. 2, the technical solution specifically includes the following steps:

step 210, determining prediction information of each point in the image to be supplemented and annotated according to the image to be supplemented and the description information of the region to be supplemented and annotated in the image to be supplemented and annotated, and determining the category point to be supplemented according to each prediction information.

The frame number image obtained by image acquisition based on the frame number can be an image to be supplemented with a label, and if the image label is directly carried out on the frame number image, the label is easy to be really marked. The description information of the to-be-supplemented labeling area may be description information of the to-be-supplemented labeling area in the original labeling image obtained after the initial labeling is performed according to the frame number image, for example, the description information may be a description of position information of the to-be-supplemented labeling area, and may be "star on the left or left" as an example.

Specifically, the image to be supplemented and labeled may be a frame number image, and the description information of the region to be supplemented and labeled may be description information of a missing part in the image to be supplemented and labeled, and the description information may be determined according to the shot frame number image. And then, according to the image to be supplemented and the description information of the missing part of the image to be supplemented and labeled, obtaining the prediction information such as the prediction category and the prediction score of each point in the image to be supplemented and labeled, further, according to the prediction information of each point, determining whether each point is a point in the region to be supplemented and labeled, and further determining the point in the region to be supplemented and labeled as the point of the category to be supplemented.

In the embodiment of the invention, the image to be supplemented with the annotations can comprise at least one category point to be supplemented.

And step 220, determining a connected domain formed by the category points to be supplemented as the region to be supplemented.

Specifically, each to-be-supplemented category point may be traversed, and adjacent to-be-supplemented category points are connected to obtain a connected domain, where the connected domain may be a to-be-supplemented labeled region. Of course, in the embodiment of the present invention, each to-be-supplemented category point may form at least one to-be-supplemented labeling area.

It should be noted that before the to-be-supplemented labeling region is determined, connected domains formed by the to-be-supplemented category points may also be screened, and if the area of a connected domain is smaller than a preset area threshold, the connected domain is rejected, so that the determined area of the to-be-supplemented labeling region is greater than or equal to the preset area threshold.

In the embodiment of the invention, after traversing each to-be-supplemented category point to obtain at least one connected domain, the at least one connected domain can be screened according to the preset area threshold value to obtain the connected domain with the area larger than the preset area threshold value, and the connected domain with the area larger than the preset area threshold value is determined as the to-be-supplemented marking region.

And step 230, supplementing the to-be-supplemented annotation area to a corresponding position in the original annotation image to obtain a target annotation image.

The original annotation image can be a vehicle frame number image, namely an image obtained after the initial annotation of the to-be-supplemented annotation image, and the original annotation image has a poor annotation effect and can have phenomena such as annotation deletion and the like.

Specifically, the class point to be supplemented required in the to-be-supplemented labeling region can be selected according to the pre-stored requirements of the segmentation labeling file, and the information of the to-be-supplemented labeling region contained in the class point to be supplemented is written into the segmentation labeling file, so that the to-be-supplemented labeling region is supplemented, and the target labeling image is obtained.

Of course, in the embodiment of the present invention, it may also be determined that the to-be-supplemented annotation region corresponds to a corresponding position in the original annotation image, and a gray value of each position in the corresponding original annotation image is determined according to the gray value of each to-be-supplemented annotation region, so as to change the gray value of each position in the original annotation image into the gray value of the corresponding to-be-supplemented annotation region, thereby implementing annotation supplementation on the original annotation image and obtaining the target annotation image.

According to the image annotation method provided by the embodiment of the invention, according to an image to be supplemented and annotated and description information of a region to be supplemented and annotated in the image to be supplemented and annotated, prediction information of each point in the image to be supplemented and annotated is determined, and a category point to be supplemented is determined according to each prediction information; determining a connected domain formed by the category points to be supplemented as the region to be supplemented; and supplementing the marking area to be supplemented to the corresponding position in the original marking image to obtain the target marking image. According to the technical scheme, the to-be-supplemented type points needing to be supplemented and labeled can be determined according to the to-be-supplemented labeled image and the description information of the to-be-supplemented labeled area in the to-be-supplemented labeled image, then the to-be-supplemented type points can form a connected domain, the connected domain can be the to-be-supplemented labeled area, the to-be-supplemented labeled area can be supplemented into the corresponding position in the original labeled image, the supplementing and labeling of the original labeled image are achieved, the target labeled image is obtained, the complete labeling of the image is achieved, the image labeling speed is improved, and the image labeling time is further saved.

Example two

Fig. 3 is a flowchart of an image annotation method according to a second embodiment of the present invention, which is embodied on the basis of the second embodiment. In this embodiment, the method may further include:

step 310, determining the prediction information of each point in the image to be supplemented and annotated according to the image to be supplemented and the description information of the region to be supplemented and annotated in the image to be supplemented and annotated, and determining the category point to be supplemented according to each prediction information.

In one embodiment, step 310 may specifically include:

and inputting the description information of the to-be-supplemented annotation image and the to-be-supplemented annotation region in the to-be-supplemented annotation image as input information into a preset annotation model, and obtaining output information as a to-be-supplemented category point.

Fig. 4 is a schematic flow chart of determining a category point to be supplemented by a preset annotation model in an image annotation method according to a second embodiment of the present invention, as shown in fig. 4, the method may include:

step 3110, performing model training using the training image, the training information, and the real class point as training data, and calculating a loss function.

The training image can be a frame number image shot in history, the training information can be description information of a target area in a history annotation image obtained after the frame number image shot in history is subjected to initial annotation, and the real category point is a real missing point in the history annotation image.

Specifically, the training image and the training information may be input into the model as input information, and the obtained output result may be a prediction category point. The loss function may then be calculated from the predicted class points and the true class points.

And 3120, performing network optimization based on a back propagation algorithm until the loss function is converged to obtain the preset labeling model.

In particular, the loss function may tend to converge after network optimization of the model based on sets of training images, training information, and true class points. After the loss function converges, a preset annotation model may be determined.

3130, inputting the description information of the to-be-supplemented annotation image and the to-be-supplemented annotation region in the to-be-supplemented annotation image as input information into a preset annotation model, and obtaining output information as a to-be-supplemented category point.

Specifically, the image to be annotated and the description information of the region to be annotated in the image to be annotated can be used as input information to be input into a preset annotation model, and the preset annotation model can determine the category point to be supplemented according to the image to be annotated and the description information of the region to be annotated.

Fig. 5 is a schematic diagram of a preset labeling model in an image labeling method according to a second embodiment of the present invention, and as shown in fig. 5, the preset labeling module is composed of a convolutional neural module, a spatial pyramid pooling module, a text processing module, a bilinear fusion module, and an attention mechanism module, which are connected to each other.

Fig. 6 is a schematic flowchart of step 3130 in an image annotation method according to a second embodiment of the present invention, as shown in fig. 6, step 3130 may further include:

3131, inputting the image to be supplemented and labeled into the convolutional neural module to obtain feature information of the image to be supplemented and labeled.

Specifically, the convolutional neural network may perform feature extraction on the image to be supplemented, so as to obtain feature information of the image to be supplemented.

3132, inputting the feature information into the spatial pyramid pooling module to obtain the visual features of the image to be supplemented.

Specifically, the feature information may be input into the spatial pyramid pooling model to obtain the visual feature of the to-be-supplemented annotation image with the preset area.

3133, inputting the description information of the to-be-supplemented labeling area in the to-be-supplemented labeling image into the text processing module to obtain the text features of the to-be-supplemented labeling area.

The description information of the to-be-supplemented labeling area in the to-be-supplemented labeling image may be a description of location information of the to-be-supplemented labeling area, and may be, for example, "star on the left or left" or the like. Specifically, the position information description of the to-be-supplemented labeling area can be input into the text processing module, so that various information of the to-be-supplemented labeling area can be automatically acquired, and manual labeling is replaced. And obtaining the text characteristics of the marked region to be supplemented, namely obtaining the description information of the marked region to be supplemented.

And 3134, fusing the visual features and the text features based on the bilinear fusion module to obtain multi-modal features.

Specifically, the visual feature and the text feature of the to-be-supplemented labeling region can be fused to obtain a multi-modal feature, so that the to-be-supplemented labeling region can be more accurately described.

In one embodiment, the step of fusing the visual feature and the text feature based on the bilinear fusion module includes:

a) determining visual feature vectors

And text feature vectors

To obtain a fused feature matrix b, where T is the number of channels,

b) after performing the maximum pooling or minimum pooling on the fused feature matrix b, the fused feature matrix b is expanded into a vector, for example, after performing the maximum pooling on the fused feature matrix b, the expanded vector is

After minimum pooling processing is carried out on the fusion characteristic matrix b, vectors obtained by expansion are

c) After the moment normalization operation and the L2 normalization operation are carried out on the vector expanded after the fusion, the feature vector after the fusion can be obtained, namely the multi-modal feature is

Wherein x is a tensor formed by stretching the matrix obtained in step b, i.e. the above formula S_minIn the case of the size of m × n, it is extended to the tensor obtained at the size of 1 × n.

The embodiment of the invention fuses the visual features and the text features based on the bilinear fusion module and aims to associate the visual features and the text features through a bilinear pooling scheme. Common fusion methods include concatenation, bitwise multiplication, bitwise addition, etc., which can be understood as an operation on the visual feature matrix and the text feature matrix. The bilinear fusion method has the same purpose as the fusion method, and adopts a scheme of first outer product and then pooling to carry out numerical calculation on the visual characteristic matrix and the text characteristic matrix so as to realize the correlation of the visual characteristic matrix and the text characteristic matrix.

And 3135, inputting the multi-modal features into the attention mechanism module to obtain a category point to be supplemented.

Specifically, the attention mechanism module can be used for matching the relationship of the associated entities in the multi-modal features, determining the prediction information of each point in the image to be supplemented and labeled, searching a feature map layer which accords with the natural language description relationship of the region to be supplemented and labeled from the multi-layer feature map layers of the multi-modal features, and highlighting the target entity, namely the required entity of the region to be supplemented and the category point to be supplemented.

In addition, the prediction information may include a prediction category and a prediction threshold, the prediction category may be compared to be a character, and a point corresponding to a prediction score greater than a preset threshold may be determined as a category point to be supplemented. For example, after obtaining the multi-modal features, each entity can be regarded as a node, and the attention mechanism module can match the associated entities according to the visual features and the text features of the nodes, so as to obtain the target entity. Fig. 7 is a schematic diagram of a frame number start-stop supplementary labeling in an image labeling method according to a second embodiment of the present invention, as shown in fig. 7, taking the frame number start-stop supplementary labeling as an example:

after the visual features are obtained, namely characters such as letters ' L ', ' W ', ' C ', ' 6 ', ' Z ' and the like in the graph and a frame number start-stop character ' are obtained, after the characters are positioned to the front and the back two characters and the attribute word ' letter L ' according to the attribute word ' star ', the relation between the characters is learned and found through an attention mechanism module, the scores of different categories and attributes are combined, the characters are positioned to the left of the target ' star ' according to the relation word ' on the left ', the scores of different categories and attributes are combined through the attention mechanism module, and then the category points to be supplemented are obtained.

And step 320, determining a connected domain formed by the category points to be supplemented as the region to be supplemented.

In one embodiment, step 320 may specifically include:

forming at least one connected domain by each category point to be supplemented; traversing each category point to be supplemented to screen each connected domain to obtain a target connected domain; and determining the target connected domain as the to-be-supplemented marking region.

Specifically, each to-be-supplemented category point may be traversed, adjacent to-be-supplemented category points are connected to obtain connected domains, then the to-be-supplemented annotation image is traversed, the connected domains are screened based on the preset area, the connected domains with the areas larger than or equal to the preset area are determined as target connected domains, and the target connected domains are determined as to-be-supplemented annotation domains.

And step 330, supplementing the to-be-supplemented annotation area to a corresponding position in the original annotation image to obtain a target annotation image.

In one embodiment, step 330 may specifically include:

selecting a target supplementary category point from the to-be-supplemented labeling area according to a preset labeling requirement; supplementing the target supplement type points to corresponding positions in the original annotation image, and meanwhile determining supplement gray values of all points in the to-be-supplemented annotation area; and determining the gray value of the target area corresponding to the area to be supplemented and marked in the original marked image as the supplemented gray value to obtain the target marked image.

Specifically, the required supplementary category point in the to-be-supplemented labeling area can be selected according to the pre-stored preset labeling requirement, and the information of the to-be-supplemented labeling area contained in the supplementary category point is written into the preset labeling file, so that the to-be-supplemented labeling area is supplemented; meanwhile, determining that the to-be-supplemented labeling area corresponds to a corresponding position in the original labeling image, and determining the gray value of each position in the corresponding original labeling image according to the gray value of each to-be-supplemented labeling area so as to change the gray value of each position in the original labeling image into the gray value of the corresponding to-be-supplemented labeling area, thereby realizing the labeling supplementation of the original labeling image and further obtaining the target labeling image.

According to the image annotation method provided by the second embodiment of the invention, according to the image to be supplemented and the description information of the region to be supplemented and annotated in the image to be supplemented and annotated, the prediction information of each point in the image to be supplemented and annotated is determined, and the category point to be supplemented is determined according to each prediction information; determining a connected domain formed by the category points to be supplemented as the region to be supplemented; and supplementing the marking area to be supplemented to the corresponding position in the original marking image to obtain the target marking image. According to the technical scheme, the to-be-supplemented type points needing to be supplemented and labeled can be determined according to the to-be-supplemented labeled image and the description information of the to-be-supplemented labeled area in the to-be-supplemented labeled image, then the to-be-supplemented type points can form a connected domain, the connected domain can be the to-be-supplemented labeled area, the to-be-supplemented labeled area can be supplemented into the corresponding position in the original labeled image, the supplementing and labeling of the original labeled image are achieved, the target labeled image is obtained, the complete labeling of the image is achieved, the image labeling speed is improved, and the image labeling time is further saved.

Fig. 8 is a flowchart of an implementation of an image annotation method according to a second embodiment of the present invention, which exemplarily shows one implementation manner. As shown in figure 8 of the drawings,

and step 810, inputting the historical frame number image, the description information of the area to be supplemented and labeled in the historical frame number image and the real category point as training data into a preset labeling model.

And 811, determining the characteristic information of the historical frame number image by a convolutional neural module of a preset labeling model, and sending the characteristic information to a spatial pyramid pooling module.

And step 812, the spatial pyramid pooling module determines the visual characteristics of the historical frame number image according to the characteristic information and sends the visual characteristics to the bilinear fusion module.

Step 813, the text processing module of the preset labeling model determines the text features of the region to be supplemented with the label according to the description information, and sends the text features to the bilinear fusion module.

And 814, fusing the visual features and the text features by the bilinear fusion module to obtain multi-modal features, and sending the multi-modal features to the attention machine modeling module.

Step 815, the attention mechanism module determines a prediction category point according to the multi-modal features.

And 816, determining a loss function according to the real category point and the prediction category point, performing network optimization based on a back propagation algorithm until the loss function is converged, and determining a preset labeling model.

Step 817, inputting the description information of the frame number image to be supplemented and the description information of the to-be-supplemented labeling area in the to-be-supplemented labeling frame number image as input information into a preset labeling model, and obtaining output information as a to-be-supplemented category point.

And 818, determining a connected domain formed by the category points to be supplemented as a region to be supplemented.

Step 819, according to the preset labeling requirement, selecting a target supplementary category point from the to-be-supplemented labeling area; supplementing the target supplement type points to corresponding positions in the original annotation image, and meanwhile determining supplement gray values of all points in the to-be-supplemented annotation area; and determining the gray value of the target area corresponding to the area to be supplemented in the original annotation image as a supplement gray value to obtain the target annotation image.

In the implementation manner of the image annotation method provided by the second embodiment of the present invention, according to the description information of the to-be-supplemented annotated image and the to-be-supplemented annotated region in the to-be-supplemented annotated image, the to-be-supplemented category points that need to be supplemented and annotated can be determined, and then the to-be-supplemented category points can be formed into a connected domain, where the connected domain can be the to-be-supplemented annotated region, so that the to-be-supplemented annotated region can be supplemented into a corresponding position in the original annotated image, so as to implement the supplementary annotation of the original annotated image, so as to obtain the target annotated image, thereby implementing the complete annotation of the image, increasing the annotation speed of the image, and further saving the annotation time of the image.

EXAMPLE III

Fig. 9 is a schematic structural diagram of an image annotation device according to a third embodiment of the present invention, where the device is suitable for a situation where an image annotation effect is not good or an image needs to be additionally annotated, so as to improve an image annotation efficiency. The apparatus may be implemented by software and/or hardware and is typically integrated in a computer device.

As shown in fig. 9, the apparatus includes:

a to-be-supplemented category point determining module 910, configured to determine, according to an to-be-supplemented annotated image and description information of an to-be-supplemented annotated region in the to-be-supplemented annotated image, prediction information of each point in the to-be-supplemented annotated image, and determine, according to each prediction information, a to-be-supplemented category point;

a to-be-supplemented labeling area determining module 920, configured to determine a connected domain formed by each to-be-supplemented category point as the to-be-supplemented labeling area;

a supplementing module 930, configured to supplement the to-be-supplemented annotation region to a corresponding position in the original annotation image, so as to obtain a target annotation image.

The image annotation device provided in this embodiment determines, according to an image to be annotated and description information of a region to be annotated in the image to be annotated, prediction information of each point in the image to be annotated, and determines a category point to be annotated according to each of the prediction information; determining a connected domain formed by the category points to be supplemented as the region to be supplemented; and supplementing the marking area to be supplemented to the corresponding position in the original marking image to obtain the target marking image. According to the technical scheme, according to the image to be supplemented and the description information of the region to be supplemented and annotated in the image to be supplemented and annotated, the prediction information of each point in the image to be supplemented and annotated can be determined, the category point to be supplemented can be determined according to the prediction information, then the connected domain formed by the category points to be supplemented and determined as the region to be supplemented and annotated, the region to be supplemented and annotated can be supplemented and entered into the corresponding position in the original annotated image, the supplementary annotation of the original annotated image is realized, so that the target annotated image is obtained, the complete annotation of the image is realized, the image annotation speed is improved, and the image annotation time is further saved.

On the basis of the foregoing embodiment, the to-be-supplemented category point determining module 910 is specifically configured to:

On the basis of the above embodiment, the preset labeling module is composed of a convolutional neural module, a spatial pyramid pooling module, a text processing module, a bilinear fusion module, and an attention mechanism module, which are connected to each other.

On the basis of the foregoing embodiment, the to-be-supplemented category point determining module 910 is further specifically configured to:

inputting the description information of the to-be-supplemented labeling area in the to-be-supplemented labeling image into the text processing module to obtain the text characteristics of the to-be-supplemented labeling area;

and inputting the multi-modal characteristics into the attention mechanism module to obtain the prediction information to-be-supplemented category points of each point in the to-be-supplemented annotated image.

On the basis of the above embodiment, the determining the category point to be supplemented according to each piece of prediction information, where the prediction information includes a prediction score, includes:

comparing each prediction score with a preset threshold;

On the basis of the foregoing embodiment, the module 920 for determining a to-be-supplemented labeled area is specifically configured to:

On the basis of the foregoing embodiment, the supplement module 930 is specifically configured to:

supplementing the target supplement type points to corresponding positions in the original annotation image, and meanwhile determining supplement gray values of all points in the to-be-supplemented annotation area; and determining the gray value of the target area corresponding to the area to be supplemented and marked in the original marked image as the supplemented gray value to obtain the target marked image.

The image annotation device provided by the embodiment of the invention can execute the image annotation method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

Example four

Fig. 10 is a schematic structural diagram of a computer apparatus according to a fourth embodiment of the present invention, as shown in fig. 10, the computer apparatus includes a processor 1010, a memory 1020, an input device 1030, and an output device 1040; the number of the processors 1010 in the computer device may be one or more, and one processor 1010 is taken as an example in fig. 10; the processor 1010, the memory 1020, the input device 1030, and the output device 1040 in the computer apparatus may be connected by a bus or other means, and fig. 10 illustrates an example of connection by a bus.

The memory 1020, which is a computer-readable storage medium, can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the image annotation method in the embodiment of the present invention (for example, the category point to be supplemented determining module 910, the region to be supplemented determining module 920, and the supplementary module 930 in the image annotation device). The processor 1010 executes various functional applications and data processing of the computer device by executing software programs, instructions and modules stored in the memory 1020, so as to implement the image annotation method described above.

The processor 1010 may include one or more Central Processing Units (CPUs), and may also include multiple processors 1010. Each of the processors 1010 may be a single-Core Processor (CPU) or a multi-Core Processor (CPU). Processor 1010 may refer herein to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).

The memory 1020 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 1020 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 1020 may further include memory located remotely from the processor 1010, which may be connected to a computer device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 1030 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus. Output device 1040 may include a display device such as a display screen.

The computer device provided by the embodiment of the invention can execute the image annotation method provided by the embodiment, and has corresponding functions and beneficial effects.

EXAMPLE five

An embodiment of the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform an image annotation method, including:

The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a RAM, a ROM, an erasable programmable read-only memory (EPROM), a register, a hard disk, an optical fiber, a CD-ROM, an optical storage device, a magnetic storage device, any suitable combination of the foregoing, or any other form of computer readable storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuit (ASIC). In embodiments of the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Of course, the storage medium provided by the embodiment of the present invention contains computer-executable instructions, and the computer-executable instructions are not limited to the method operations described above, and may also perform related operations in the image annotation method provided by any embodiment of the present invention.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.

It should be noted that, in the embodiment of the image annotation device, the included units and modules are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. An image annotation method, comprising:

2. The image annotation method according to claim 1, wherein determining prediction information of each point in the image to be supplemented and annotated according to the image to be supplemented and description information of the region to be supplemented and annotated in the image to be supplemented and annotated, and determining a category point to be supplemented according to each of the prediction information comprises:

3. The image annotation method of claim 2, wherein the preset annotation module is composed of a convolutional neural module, a spatial pyramid pooling module, a text processing module, a bilinear fusion module, and an attention mechanism module, which are connected to each other.

4. The image annotation method according to claim 3, wherein determining prediction information of each point in the image to be supplemented and annotated according to the image to be supplemented and the description information of the region to be supplemented and annotated in the image to be supplemented and annotated, and determining the category point to be supplemented according to each prediction information comprises:

and inputting the multi-modal features into the attention mechanism module to obtain the category points to be supplemented.

5. The image annotation method according to claim 1, wherein the prediction information includes a prediction score, and determining the category point to be supplemented from each piece of the prediction information includes:

comparing each prediction score with a preset threshold;

6. The image annotation method according to claim 1, wherein determining a connected domain formed by each category point to be supplemented as the region to be supplemented comprises:

7. The image annotation method of claim 1, wherein the supplementing the to-be-supplemented annotation region to a corresponding position in the original annotation image to obtain a target annotation image comprises:

supplementing the target supplement type points to corresponding positions in the original annotation image, and meanwhile determining supplement gray values of all points in the to-be-supplemented annotation area; and determining the gray value of the target area corresponding to the to-be-supplemented labeling area in the original labeling image as the supplemented gray value by the to-be-supplemented labeling area to obtain a target labeling image.

8. An image annotation apparatus, comprising:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the image annotation method according to any one of claims 1 to 7 when executing the program.

10. A storage medium containing computer executable instructions for performing the image annotation method of any one of claims 1 to 7 when executed by a computer processor.