CN112581446A

CN112581446A - Method, device and equipment for detecting salient object of image and storage medium

Info

Publication number: CN112581446A
Application number: CN202011479093.2A
Authority: CN
Inventors: 吕朋伟; 姜文杰
Original assignee: Insta360 Innovation Technology Co Ltd
Current assignee: Insta360 Innovation Technology Co Ltd
Priority date: 2020-12-15
Filing date: 2020-12-15
Publication date: 2021-03-30
Anticipated expiration: 2040-12-15
Also published as: CN112581446B; WO2022127814A1

Abstract

The present invention is applicable to the technical field of image processing, and provides a method, device, equipment and storage medium for detecting a saliency object of an image. The method includes: first acquiring an image to be detected, and then detecting the image to be detected by a saliency detection model , get all the salient objects in the image to be detected, then calculate the saliency score of each salient object separately, and finally sort all salient objects according to the saliency score, and select the one with the largest saliency score. The salient object is determined as the target salient object in the image to be detected, thereby improving the recognition speed and recognition accuracy of the salient object in the multi-scene image.

Description

Method, device and equipment for detecting salient object of image and storage medium

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a method, a device, equipment and a storage medium for detecting a salient object of an image.

Background

With the rapid development of information technology, cameras, portable cameras and the like on mobile electronic devices are continuously upgraded and applied, so that people record and share information by using images becomes a normal state. Images have become a main data resource of the information society, which leads to an increasing demand for data processing, and the increasing demand for data processing necessarily requires an increase in information processing efficiency. For an image, people are often interested in only a part of the area, which can represent the image content most and arouse the user interest most, in the image, and this part of the area is the saliency area, so how to automatically acquire the saliency area on the image becomes more and more important.

In recent years, a convolutional neural network is widely applied to the field of machine vision, has the capability of automatically extracting image features, particularly, has the capability of using a full convolutional neural network, and greatly improves the performance of salient object detection, but the current salient detection method based on a deep learning neural network generally performs image transformation, such as scaling, feature extraction and the like, on an image containing salient objects, in the image transformation process, the salient objects with small scale are easily polluted, so that the missing detection of the salient objects with small targets is caused, the existing salient detection types are mainly concentrated on specific objects in specific fields such as people, animals, plants and the like, the identification of richer salient objects in daily life scenes is lacked, and meanwhile, the comparison analysis among a plurality of salient objects is lacked under the condition that a plurality of salient objects exist, causing significant ambiguity problems.

Disclosure of Invention

The invention aims to provide a method, a device, equipment and a storage medium for detecting salient objects of images, and aims to solve the problems of low speed and low recognition efficiency of detecting salient objects in various scene images due to the fact that the prior art cannot provide an effective method for detecting salient objects of images.

In one aspect, the present invention provides a method for detecting a salient object in an image, the method comprising the steps of:

acquiring an image to be detected;

detecting the image to be detected through a significance detection model to obtain all significance objects in the image to be detected;

calculating a saliency score of each of the saliency objects separately;

and performing significance sorting on all the significant objects according to the significance scores to obtain a significant object with the maximum significance score value, and determining the significant object as a target significant object in the image to be detected.

Preferably, the step of calculating a saliency score for each said salient object separately comprises:

respectively calculating a first significance score of each significant object, and calculating a first significance mean value according to the obtained first significance scores of all the significant objects;

determining a significance threshold value according to the first significance mean value;

respectively cutting the outline area of each salient object according to the saliency threshold;

calculating a second significance mean value according to all the clipped significance objects;

and respectively calculating a second significance score of each significant object according to the calculated second significance mean value and a preset proportionality coefficient, and determining the obtained second significance score as the significance score.

Preferably, the method further comprises:

and performing learning training on a preset neural network through preset training data on a mapping relation between an image and a salient object in the image to obtain the salient detection model, wherein the training data comprises an image data set without the salient object and an image data set containing the salient object.

Further preferably, the preset neural network is a U-Net network, and/or a classical significance detection network.

Further preferably, the U-Net network comprises a down-sampling layer comprising a hopping connection module, the hopping connection module comprising a depth separable convolutional layer and a max-pooling layer.

In another aspect, the present invention provides a salient object detection apparatus for an image, the apparatus comprising:

the detection image acquisition unit is used for acquiring an image to be detected;

the salient object obtaining unit is used for detecting the image to be detected through a salient detection model to obtain all salient objects in the image to be detected;

a saliency score calculation unit for calculating a saliency score of each of the saliency objects, respectively; and

and the significance sorting unit is used for performing significance sorting on all the significance objects according to the significance scores to obtain a significance object with the maximum significance score value and determining the significance object as a target significance object in the image to be detected.

Preferably, the saliency score calculation unit includes:

a first mean value calculating unit, configured to calculate a first saliency score of each of the saliency objects, and calculate a first saliency mean value according to the obtained first saliency scores of all the saliency objects;

a threshold determination unit, configured to determine a significance threshold according to the first significance mean;

the region clipping unit is used for clipping the outline region of each salient object according to the saliency threshold;

the second mean value calculating unit is used for calculating a second significance mean value according to all the clipped significant objects; and

and the score calculating unit is used for respectively calculating a second significance score of each significant object according to the calculated second significance mean value and a preset proportionality coefficient, and determining the obtained second significance score as the significance score.

Preferably, the apparatus further comprises:

and the detection model training unit is used for performing learning training on a mapping relation between an image and a salient object in the image on a preset neural network through preset training data to obtain the salient detection model, wherein the training data comprises an image data set without the salient object and an image data set containing the salient object.

In another aspect, the present invention further provides an image processing apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the salient object detection method of the image when executing the computer program.

In another aspect, the present invention further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the method for detecting a salient object in an image.

According to the method, an image to be detected is obtained, the image to be detected is detected through a saliency detection model to obtain all saliency objects in the image to be detected, then the saliency score of each saliency object is calculated respectively, all saliency objects are subjected to saliency sorting according to the saliency scores, the saliency object with the largest score value after sorting is determined as a target saliency object in the image to be detected, and therefore the recognition speed and the recognition accuracy of the saliency objects in the multi-scene image are improved.

Drawings

Fig. 1 is a flowchart of an implementation of a salient object detection method for an image according to an embodiment of the present invention;

fig. 2 is a flowchart of an implementation of a salient object detection method for an image according to a second embodiment of the present invention;

fig. 3 is a flowchart of an implementation of a salient object detection method for an image according to a third embodiment of the present invention;

fig. 4 is a schematic diagram of a jump connection module in the salient object detection method for an image according to the third embodiment of the present invention;

fig. 5 is a schematic structural diagram of a salient object detection apparatus for an image according to a fourth embodiment of the present invention;

fig. 6 is a schematic structural diagram of a salient object detection apparatus of an image according to a fifth embodiment of the present invention; and

fig. 7 is a schematic structural diagram of an image processing apparatus according to a sixth embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The following detailed description of specific implementations of the present invention is provided in conjunction with specific embodiments:

the first embodiment is as follows:

fig. 1 shows an implementation flow of a salient object detection method for an image according to a first embodiment of the present invention, and for convenience of description, only the relevant portions of the image according to the first embodiment of the present invention are shown, which is detailed as follows:

in step S101, an image to be detected is acquired.

The embodiment of the invention is suitable for image processing equipment for image display, acquisition and the like. In the embodiment of the present invention, the image to be detected may be captured in real time by a mobile electronic device with a camera, or may be acquired from a preset storage location (e.g., a cloud storage space) by an image processing device.

In step S102, the to-be-detected image is detected by the saliency detection model, and all saliency objects in the to-be-detected image are obtained.

In the embodiment of the invention, the obtained image to be detected is detected through the saliency detection model, all saliency objects on the image to be detected are obtained, and relevant attribute information (such as a contour region, position information, color and the like) of each saliency object is obtained.

When the image to be detected is detected through the saliency detection model, preferably, the input image to be detected is subjected to feature extraction and image segmentation through a U-Net network and/or a classical saliency detection network so as to obtain all saliency objects on the image to be detected, and therefore the saliency degree and accuracy of saliency detection are improved.

In step S103, a saliency score is calculated for each salient object, respectively.

In the embodiment of the present invention, the saliency score of each saliency object is calculated separately from the correlation attribute information of the saliency object, and the saliency score value is pixel-level.

Preferably, the calculation of the saliency score for each salient object is achieved by:

(1) and respectively calculating a first significance score of each significant object, and calculating a first significance mean value according to the obtained first significance scores of all significant objects.

In the embodiment of the present invention, the first saliency score of each salient object is calculated according to the relevant attribute information of the salient object, and the first saliency mean is calculated according to the obtained first saliency scores of all the salient objects.

As an example, a relative relationship between each salient object and the size of the image to be detected and a color difference between each salient object and the image to be detected may be determined according to the contour region, the position information, and the color of the salient object, a first saliency score of each salient object may be determined according to the relative relationship between the sizes and the color difference, and finally, an average of the first saliency scores of all the salient objects may be calculated to obtain a first saliency average.

(2) A significance threshold is determined from the first significance mean.

In the embodiment of the present invention, the significance threshold is determined according to the first significance mean, and the significance threshold is smaller than the first significance mean, for example, if the first significance mean is M0, the significance threshold M1 is 0.2 × M0.

(3) And respectively clipping the outline area of each salient object according to the saliency threshold.

In the embodiment of the invention, the contour region of each salient object is respectively clipped according to the saliency threshold, and only the contour region of the salient object higher than the current saliency threshold is reserved.

(4) And calculating a second significance mean value according to all the clipped significance objects.

In the embodiment of the invention, the second saliency score of each saliency object is recalculated according to the clipped outline region reserved by each saliency object, and the second saliency average is calculated according to the obtained second saliency score.

(5) And respectively calculating a second significance score of each significant object according to the calculated second significance mean value and a preset proportionality coefficient, and determining the obtained second significance score as a significance score.

In the embodiment of the invention, the scale coefficient is determined according to the area size of the salient objects, the larger the area is, the larger the scale coefficient is, and the second saliency score of each salient object is obtained by multiplying the scale coefficient of each salient object on the basis of the calculated second saliency mean value.

The calculation of the saliency score of each salient object is realized through the steps (1) to (5), so that the priority of the salient objects is clarified through the comparative analysis among a plurality of salient objects in one image.

In step S104, all salient objects are subjected to saliency sorting according to the saliency scores, and the salient object with the highest score value after sorting is determined as the target salient object in the image to be detected.

In the embodiment of the invention, all the salient objects are subjected to the significance sorting according to the score of the significance score, and the salient objects can be subjected to the ascending/descending sorting according to the score, wherein the salient object with the largest score of the significance is the most significant target object in the current image to be detected, and the target object is determined as the target salient object in the image to be detected.

In the embodiment of the invention, all salient objects in the image to be detected are detected through the salient detection model, the salient score of each salient object is respectively calculated, and all salient objects are subjected to salient sequencing according to the salient scores to obtain the most salient target object in the image to be detected, so that the identification speed and the identification accuracy of the salient objects in the multi-scene image are improved.

Example two:

fig. 2 shows an implementation flow of a salient object detection method for an image according to a second embodiment of the present invention, and for convenience of description, only the relevant portions of the embodiment of the present invention are shown, which is detailed as follows:

in step S201, a preset neural network is subjected to learning training of a mapping relationship between an image and a salient object in the image through preset training data to obtain a salient detection model, where the training data includes an image data set without a salient object and an image data set with a salient object.

The embodiment of the invention is suitable for image processing equipment for image display, acquisition and the like. In the embodiment of the present invention, the training data composed of the image dataset without a salient object and the image dataset with a salient object may be a standard dataset, such as an Imagenet dataset, or may be a customized image training dataset, where one or more salient objects may be in the image of the image dataset with a salient object. When the preset neural network is trained, firstly, the fine contour of the salient object on the image is marked out in an artificial mode on the image data set containing the salient object, but the marked salient object is not divided into specific categories, namely, all the salient objects are classified into one category, other non-salient areas on the image are classified into another category, an image pair of the image and a salient result is obtained, then the marked image data set and the image data set without the salient object are used for learning and training the mapping relation between the image and the salient object in the image on the preset neural network, and a salient detection model is obtained, so that the training speed and the training effect of the network are improved.

Preferably, the preset neural network is a U-Net network and/or a classical significance detection network, so that the significance degree and the accuracy of significance detection of the neural network are improved.

Further preferably, the U-Net network is an improved U-Net network including a skip connection module in a downsampling layer, and the skip connection module includes a depth Separable Convolution layer (SepConv) and a Max Pooling layer (Max Pooling), so that excessive loss of details of small target dominant objects in an image during downsampling of a saliency detection model is avoided, and the probability of missing detection of the small target dominant objects is reduced.

In step S202, an image to be detected is acquired.

In step S203, the to-be-detected image is detected by the saliency detection model, and all saliency objects in the to-be-detected image are obtained.

In step S204, a saliency score is calculated for each salient object, respectively.

In step S205, all salient objects are subjected to saliency sorting according to the saliency scores, and the salient object with the largest score value after sorting is determined as the target salient object in the image to be detected.

In the embodiment of the present invention, the detailed implementation of steps S202 to S205 can refer to the description of steps S101 to S104 in the first embodiment, and will not be described herein again.

In the embodiment of the invention, a preset neural network is trained through training data composed of an image data set without a salient object and an image data set containing the salient object to obtain a saliency detection model, all salient objects in an image to be detected are detected through the saliency detection model, the saliency score of each salient object is respectively calculated, all salient objects are subjected to saliency sequencing according to the saliency scores to obtain the most salient target object in the image to be detected, and therefore the recognition speed and the recognition accuracy of the salient objects in the multi-scene image are improved.

Example three:

fig. 3 shows an implementation flow of a salient object detection method for an image provided by the third embodiment of the present invention, and for convenience of description, only the parts related to the third embodiment of the present invention are shown, which are detailed as follows:

in step S301, an image to be detected is acquired.

In step S302, the image to be detected is detected through the improved U-Net network, and all salient objects in the image to be detected are obtained, wherein the downsampling of the improved U-Net network includes the jump connection module.

In the embodiment of the invention, a saliency detection model is an improved U-Net network which comprises a jump connection module in a lower sampling layer, the improved U-Net network is used for carrying out feature extraction and image segmentation on an input image to be detected to obtain all saliency objects on the image to be detected, and relevant attribute information (such as a contour region, position information, color and the like) of each saliency object is obtained, wherein the jump connection module does not change the whole U-Net structure, and the jump connection module is arranged in the down sampling process of each layer of the U-Net network U-shaped structure.

Preferably, the jump connection module included in the downsampling structure of the improved U-Net network includes a depth Separable Convolution layer (SepConv) and a Max Pooling layer (Max Pooling), so that excessive loss of details of small-target dominant objects in an image during downsampling is avoided, and the probability of missed detection of the small-target dominant objects is reduced.

Further preferably, fig. 4 shows a structure of a jump connection module, where the jump connection module includes 2 SepConv layers, a leakage corrected linear unit (leakage corrected linear unit ) function, and a Max Pooling layer, and the jump connection module implemented by the Max Pooling layer compresses features before downsampling and directly transmits the compressed features to a feature extraction module after downsampling, so as to retain more original features before downsampling, thereby further avoiding excessive loss of details of small target dominant objects in an image during downsampling, and reducing the probability of missed detection of the small target dominant objects. Illustratively, after the feature a is input into the jump connection module, the feature b is obtained after deep separable convolution is carried out through 2 layers of SepConv layers, meanwhile, the Max Pooling layer carries out maximum Pooling operation on the feature a to obtain a feature c, and finally, the jump connection module carries out feature fusion on the features b and c to obtain and output a feature d.

In step S303, a saliency score is calculated for each salient object, respectively.

In step S304, all salient objects are subjected to saliency sorting according to the saliency scores, and the salient object with the highest score value after sorting is determined as the target salient object in the image to be detected.

In the embodiment of the present invention, the detailed implementation of steps S303 to S304 may refer to the description of steps S103 to S104 in the first embodiment, and will not be described herein again.

In the embodiment of the invention, all salient objects in the image to be detected are detected through an improved U-Net network containing a jump connection module in down-sampling, the salient score of each salient object is respectively calculated, and all salient objects are subjected to salient sequencing according to the salient scores to obtain the most salient target object in the image to be detected, so that the identification speed and the identification accuracy of the salient objects in the multi-scene image are improved.

Example four:

fig. 5 shows a structure of a salient object detection apparatus of an image according to a fourth embodiment of the present invention, and for convenience of description, only a part related to the embodiment of the present invention is shown, where:

a detection image acquisition unit 51 for acquiring an image to be detected.

And the salient object obtaining unit 52 is configured to detect the image to be detected through the salient detection model, and obtain all salient objects in the image to be detected.

And a saliency score calculation unit 53 for calculating a saliency score of each of the saliency objects, respectively.

And the significance sorting unit 54 is used for performing significance sorting on all significance objects according to the significance scores, and determining the significance object with the largest score value after sorting as the target significance object in the image to be detected.

In the embodiment of the present invention, each unit of the salient object detecting apparatus for an image may be implemented by a corresponding hardware or software unit, and each unit may be an independent software or hardware unit, or may be integrated into a software or hardware unit, which is not limited herein.

Example five:

fig. 6 shows a structure of a salient object detection apparatus of an image provided in the fifth embodiment of the present invention, and for convenience of description, only a part related to the fifth embodiment of the present invention is shown, where:

the detection model training unit 61 is configured to perform learning training on a mapping relationship between an image and a salient object in the image on a preset neural network through preset training data to obtain a salient detection model, where the training data includes an image data set that does not include the salient object and an image data set that includes the salient object.

And a detection image acquisition unit 62 for acquiring an image to be detected.

In the embodiment of the present invention, the image to be detected may be captured in real time by a mobile electronic device with a camera, or may be acquired from a preset storage location (e.g., a cloud storage space) by an image processing device.

And the salient object obtaining unit 63 is configured to detect the image to be detected through the salient detection model, and obtain all salient objects in the image to be detected.

When the image to be detected is detected through the saliency detection model, preferably, the input image to be detected is subjected to feature extraction and image segmentation through a U-Net network and/or a classical saliency detection network, so that the saliency degree and the accuracy of saliency detection are improved.

Still preferably, the saliency detection model is an improved U-Net network including jump connection modules in a downsampling layer, wherein the jump connection modules include a depth Separable Convolution layer (SepConv) and a Max Pooling layer (Max Pooling), and the jump connection modules do not change the overall U-Net structure, and there is a jump connection module in each layer of the U-Net network downsampling process, so that details of small target dominant objects in an image are prevented from being lost too much in the downsampling process, and the probability of missing detection of the small target dominant objects is reduced.

Further preferably, the jump connection module comprises 2 SepConv layers, a leakage corrected linear unit (leakage ReLU) function and a Max Pooling layer, the jump connection module realized by the Max Pooling layer compresses the features before downsampling and directly transmits the compressed features to the feature extraction module after downsampling, more original features before downsampling are reserved, and therefore the situation that details of small-target dominant objects in an image are lost too much in the downsampling process is further avoided, and the probability of missed detection of the small-target dominant objects is reduced. Illustratively, after the feature a is input into the jump connection module, the feature b is obtained after deep separable convolution is carried out through 2 layers of SepConv layers, meanwhile, the Max Pooling layer carries out maximum Pooling operation on the feature a to obtain a feature c, and finally, the jump connection module carries out feature fusion on the features b and c to obtain and output a feature d.

And a saliency score calculation unit 64 for calculating a saliency score of each saliency object separately.

And the significance sorting unit 65 is used for performing significance sorting on all significance objects according to the significance scores, and determining the significance object with the largest score value after sorting as the target significance object in the image to be detected.

Wherein, preferably, the significant score calculating unit 64 includes:

the first mean calculating unit 641 is configured to calculate a first saliency score of each saliency object, and calculate a first saliency mean according to the obtained first saliency scores of all saliency objects.

A threshold determining unit 642, configured to determine a significance threshold according to the first significance mean.

A region clipping unit 643, configured to clip the outline region of each salient object according to the saliency threshold.

A second mean calculation unit 644, configured to calculate a second significant mean according to all the significant objects after clipping.

And a score calculating unit 645, configured to calculate a second saliency score of each saliency object according to the calculated second saliency mean and a preset proportionality coefficient, and determine the obtained second saliency score as a saliency score.

Example six:

fig. 7 shows a configuration of an image processing apparatus according to a sixth embodiment of the present invention, and for convenience of explanation, only a part related to the embodiment of the present invention is shown.

The image processing apparatus 7 of the embodiment of the present invention includes a processor 70, a memory 71, and a computer program 72 stored in the memory 71 and executable on the processor 70. The processor 70, when executing the computer program 72, implements the steps in the above-described embodiment of the method for detecting salient objects of an image, such as the steps S101 to S104 shown in fig. 1. Alternatively, the processor 70, when executing the computer program 72, implements the functions of the units in the above-described apparatus embodiments, such as the functions of the units 51 to 54 shown in fig. 5.

In the embodiment of the invention, all salient objects in the image to be detected are detected through the salient detection model, the salient score of each salient object is respectively calculated, all the salient objects are subjected to salient sorting according to the salient scores, and the salient object with the largest value after sorting is determined as the target salient object in the image to be detected, so that the identification speed and the identification accuracy of the salient object in the multi-scene image are improved.

The image processing device of the embodiment of the invention can be a smart phone or a personal computer. The steps implemented when the processor 70 in the image processing apparatus 7 executes the computer program 72 to implement the method for detecting a salient object in an image may refer to the description of the foregoing method embodiments, and are not repeated herein.

Example seven:

in an embodiment of the present invention, a computer-readable storage medium is provided, which stores a computer program that, when executed by a processor, implements the steps in the above-described salient object detection method embodiment of the image, for example, steps S101 to S104 shown in fig. 1. Alternatively, the computer program may be adapted to perform the functions of the units of the above-described device embodiments, such as the functions of the units 51 to 54 shown in fig. 5, when executed by the processor.

The computer readable storage medium of the embodiments of the present invention may include any entity or device capable of carrying computer program code, a recording medium, such as a ROM/RAM, a magnetic disk, an optical disk, a flash memory, or the like.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A method for detecting salient objects in an image, the method comprising the steps of:

acquiring an image to be detected;

calculating a saliency score of each of the saliency objects separately;

2. The method of claim 1, wherein the step of separately calculating a saliency score for each said salient object comprises:

3. The method of claim 1, wherein before the detecting the image to be detected by the saliency detection model to obtain all the saliency objects in the image to be detected, the method further comprises:

4. The method of claim 3, wherein the pre-set neural network is a U-Net network, and/or a classical significance detection network.

5. The method of claim 4, wherein the U-Net network comprises a downsampling layer including a skip connection module, the skip connection module comprising a depth separable convolutional layer and a max-pooling layer.

6. An apparatus for detecting a salient object in an image, the apparatus comprising:

7. The apparatus of claim 6, wherein the prominence score calculation unit comprises:

8. The apparatus of claim 6, wherein the apparatus further comprises:

9. An image processing apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 5 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.