CN109344818B - A salient object detection method in light field based on deep convolutional network - Google Patents
A salient object detection method in light field based on deep convolutional network Download PDFInfo
- Publication number
- CN109344818B CN109344818B CN201811141315.2A CN201811141315A CN109344818B CN 109344818 B CN109344818 B CN 109344818B CN 201811141315 A CN201811141315 A CN 201811141315A CN 109344818 B CN109344818 B CN 109344818B
- Authority
- CN
- China
- Prior art keywords
- light field
- layer
- image
- neural network
- field data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/12—Details of acquisition arrangements; Constructional details thereof
- G06V10/14—Optical characteristics of the device performing the acquisition or on the illumination arrangements
- G06V10/145—Illumination specially adapted for pattern recognition, e.g. using gratings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a light field significant target detection method based on a deep convolutional network, which comprises the following steps: 1 converting light field data obtained by using a light field acquisition device into sub-aperture images of all visual angles; 2, recombining the sub-aperture images under different visual angles into a micro-lens image; 3, performing data enhancement on the microlens image; 4, building a significant target detection model combined with a microlens image on the basis of the pre-training weight of the Deeplab-V2 network, and training by using a data set; and 5, carrying out the salient object detection on the light field data to be processed by utilizing the trained salient object detection model. The method can effectively improve the accuracy of the detection of the salient target of the complex scene image.
Description
Technical Field
The invention belongs to the field of computer vision, image processing and analysis, and particularly relates to a light field significant target detection method based on a deep convolutional network.
Background
Salient object detection is the perceptual capability of the human visual system. When an image is observed, the vision system can rapidly acquire the interested region and the target in the image, and the process of acquiring the interested region and the target is the detection of the salient target. With the development of computer technology and internet and the popularization of mobile intelligent equipment, people acquire external images to show blowout type growth. The obvious target detection selects a small part from a large amount of input visual information to enter subsequent complex processing, such as target detection and identification, image retrieval, image segmentation and the like, so that the calculation amount of a visual system is effectively reduced. At present, detection of salient objects has become one of the hot spots of research in the field of computer vision.
Current methods of salient object detection can be classified into three categories, according to the available image data: two-dimensional salient object detection, three-dimensional salient object detection and light field salient object detection.
The two-dimensional salient object detection method is characterized in that a traditional camera is used for obtaining a two-dimensional image, and the traditional method or the learning-based method is used for extracting and fusing characteristics such as color, brightness, position, texture and the like through a local or global contrast frame so as to realize salient and non-salient distinction.
The three-dimensional salient target detection method is used for realizing salient target detection by utilizing the depth information of a two-dimensional image and a scene. Depth information of a scene is acquired by a three-dimensional sensor, which also plays an important role in the human visual system, reflecting the distance between an object and an observer. The depth information is used for detecting the salient object, the defects of the traditional two-dimensional image are made up, the final salient image is obtained by utilizing the mutual complementation of the color and the depth, and the accuracy of detecting the salient object is improved to a certain extent.
The method for detecting the light field salient object is to process light field data acquired by a light field camera to realize salient object detection. The light field imaging can record the position and visual angle information of light radiation in a space through one-time exposure by means of a new calculation imaging technology, and the acquired light field information reflects the geometry and the reflection characteristic of a natural scene. The conventional method improves the performance of detecting the significant target of the challenging scene by fusing the significant characteristics of different light field data.
Although some methods for detecting salient objects with excellent performance have appeared in the field of computer vision, these methods still have disadvantages:
1. in the two-dimensional salient target detection method, because the two-dimensional image is the integral of the projection of light on the camera sensor and only contains the light intensity in a specific direction, the two-dimensional salient target detection is too sensitive to a high-frequency part or noise and is easily influenced by factors such as similar color and texture of a foreground and a background and disordered background.
2. In the three-dimensional obvious target detection method, the accuracy of scene depth information depends on a depth camera, and the conventional depth camera has the problems of low resolution, narrow measurement range, high noise, incapability of measuring transmission materials, easiness in sunlight and reflection interference of a smooth plane and the like.
3. In the three-dimensional salient object detection method, the characteristic information such as color, depth, position and the like are processed and fused independently, and the complementarity of the characteristic information is not considered comprehensively.
4. Most of the methods for detecting the salient objects based on the two-dimensional and three-dimensional images are based on the assumption that the objects are obviously different from the background, the background is simple and the like, and the methods have certain limitations as the image data is increased in a large scale and the complexity of the image content is increased.
5. In the detection of the light field significant target, the research of light field data on the aspect of significant target detection is just started, and the currently available data sets are fewer and the image quality is poorer. The prior method for detecting the salient object by utilizing the light field data is based on the traditional salient feature calculation method, and simultaneously models multiple clues such as color, depth, refocusing and the like respectively, so that the problems of insufficient feature expression, poor robust detection effect and the like exist.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a light field significant target detection method based on a deep convolutional network, so that the spatial information and the visual angle information of light field data can be fully utilized, and the significant target detection accuracy of a complex scene image can be effectively improved.
The invention adopts the following technical scheme for solving the technical problems:
the invention relates to a method for detecting a light field significant target based on a deep convolutional network, which is characterized by comprising the following steps of:
Step 1.1, acquiring a light field file by using light field equipment, and decoding to obtain a light field data set which is recorded as L ═ (L ═ L)1,L2,…,Ld,…,LD) Wherein L isdRepresenting the d-th light field data and denoting the d-th light field data as Ld(u, v, s, t), u and v representing any horizontal pixel and vertical pixel in the spatial information, and s and t representing any horizontal viewing angle and vertical viewing angle in the viewing angle information; d is equal to [1, D ∈]And D represents the total number of light field data;
step 1.2, fixing a horizontal view angle s and a vertical view angle t, and traversing the d-th light field data Ld(u, v, s, t) of all horizontal and vertical pixels, resulting in the d-th light-field data LdSub-aperture image at view angle of tth row and sth column in (u, v, s, t)And isIs marked as V and U respectively, and V is belonged to [1, V ∈],u∈[1,U];
Step 1.3, traversing the light field data LdAll horizontal and vertical views in (u, v, s, t)Viewing angle, obtaining sub-aperture image set under the d-th all viewing angleWherein S is ∈ [1, S ∈],t∈[1,T]S represents the row of the maximum horizontal viewing angle, and T represents the column of the maximum vertical viewing angle;
step 1.4, defining the number of the selected visual angles as m multiplied by m, and utilizing a formula (1) to collect the sub-aperture images N under the d-th all visual anglesdTo select the d-th image set M centered on the central view angled:
step 1.5, obtaining the d-th microlens image I according to the x ═ v-1 × m + t, y ═ u-1 × m + sdMiddle x row and y column pixel point Id(x, y) to obtain the d-th microlens image I with height and width of H and W, respectivelydWherein x ∈ [1, H ]],y∈[1,W],H=V×m,W=U×m;
step 3, aiming at the d-th microlens image IdCarrying out data enhancement processing to obtain a d enhanced microlens image set Id'; for the d real significant map GdPerforming geometric transformation to obtain the d-th transformed real saliency map set Gd′;
Step 4, repeating steps 1.2 to 3, and obtaining D enhanced microlens image sets I' in the light field data set L (I ═ I)1′,I2′,…,Id′,…,I′D) And D sets of transformed true saliency maps are denoted G' ═ (G)1′,G2′,…,Gd′,…,G′D);
Step 5, constructing the d light field data Ld(u, v, s, t) salient object detection model;
step 5.1, acquiring a Deeplab-V2 convolutional neural network of a layer c, wherein the Deeplab-V2 convolutional neural network comprises a convolutional layer, a pooling layer and a discarding layer;
step 5.2, modifying the Deeplab-V2 convolutional neural network of the layer c to obtain a modified LFnet convolutional neural network;
step 5.2.1, adding a convolution layer LF _ conv1_1 with convolution kernel size of m multiplied by m and a ReLU activation function LF _ ReLU1_1 before the first layer of the Deeplab-V2 convolutional neural network;
setting the moving step length of the convolution kernel to be m when the convolution layer LF _ conv1_1 carries out convolution operation;
the mathematical expression of the ReLU activation function LF _ ReLU1_1 is phi (a) ═ max (0, a), where a represents the output of the convolutional layer LF _ conv1_1 and is input to the ReLU activation function LF _ ReLU1_1, and phi (a) represents the output of the ReLU activation function LF _ ReLU1_ 1;
step 5.2.2, adding a discarding layer after other convolution layers in the Deeplab-V2 convolutional neural network except convolution layers connected with the discarding layer in the convolutional layer LF _ conv1_1 and the Deeplab-V2 convolutional neural network;
step 5.2.3, setting the number of output channels of the c-1 layer in the Deeplab-V2 convolutional neural network as b, wherein b is the number of pixel categories;
step 5.2.4, adding an upsampling layer after the layer c of the Deeplab-V2 convolutional neural network, and utilizing the upsampling layer to output a feature map F to the layer c of the Deeplab-V2 convolutional neural networkd(q, r, b) performing an upsampling operation to obtain an upsampled feature map Fd' (q, r, b); wherein q, r and b represent the characteristic diagram F respectivelydWidth, height and number of channels of (q, r, b);
step 5.2.5, adding a shear layer after the upsampling layer, and according to the d-th real saliency map GdLength V and width U of said feature map F using said shear layerd' (q, r, b) obtaining said microlens image I by shearingdPixel class prediction probability map Fd″(q,r,b);
And 5.3, taking the enhanced microlens image set I 'as the input of the LFnet convolutional neural network, taking the transformed real significant image set G' as a label, using a cross entropy loss function, and training the LFnet convolutional neural network by using a gradient descent algorithm, so as to obtain a significant target detection model of the light field data, and realizing significant target detection of the light field data by using the significant target detection model.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention utilizes the second generation light field camera to collect the light field data of complex and changeable scenes, the scenes comprise various sizes of salient objects, various light sources, similarity between the salient objects and the background, disordered background and other difficulties, the defects of the current light field salient data on data and difficulty are fully supplemented, and the quality of the current light field salient data is improved.
2. According to the method, the image characteristics are extracted by utilizing the powerful function of the depth convolution network in the aspect of image processing, the spatial information and the visual angle information of the light field data are fused, the context information of the microlens image is captured by utilizing the cavity pyramid network, and the salient object in the image scene is detected, so that the defect that the visual angle information cannot be used by the current two-dimensional or three-dimensional salient object detection method is overcome, and the precision and the robustness of the image salient object detection in the complex scene are improved.
3. The multi-view information in the microlens image reflects the space geometric characteristics of a scene, the microlens image is directly input into the convolutional neural network, the obvious target detection is realized, the defect that the depth and color information is independently processed in the current light field obvious target detection method is overcome, the depth perception and the visual significance are considered, the complementarity of the depth and the color is effectively utilized, and the accuracy of the image obvious target detection is improved.
Drawings
FIG. 1 is a flow chart of the salient object detection method of the present invention;
FIG. 2 is a sub-aperture image obtained by the method of the present invention;
FIG. 3 is a microlens image obtained by the method of the present invention;
FIG. 4 is a partial scene and a true saliency map of a data set acquired by the method of the present invention;
FIG. 5 is a detailed process diagram of the microlens image input network model according to the method of the present invention;
FIG. 6 is a diagram of the Deeplab-V2 model used in the method of the present invention;
FIG. 7 is a comparison graph of detection results of some salient objects obtained by the method of the present invention and other light field salient object detection methods on a data set collected by a second generation light field camera;
fig. 8 is an analysis diagram of the quantitative comparison between the data set acquired by the second generation light field camera and the current other light field saliency extraction methods, using the "recall ratio/precision ratio curve" as the measurement standard.
Detailed Description
In this embodiment, a method for detecting a significant light field target based on a deep convolutional network is shown in fig. 1, and is performed according to the following steps:
Step 1.1, acquiring a light field file by using light field equipment and decoding to obtain the light field fileTo light field data set is denoted as L ═ (L)1,L2,…,Ld,…,LD) Wherein L isdRepresenting the d-th light field data and denoting the d-th light field data as Ld(u, v, s, t), u and v representing any horizontal pixel and vertical pixel in the spatial information, and s and t representing any horizontal viewing angle and vertical viewing angle in the viewing angle information; d is equal to [1, D ∈]And D represents the total number of light field data;
in this embodiment, a second-generation light field camera is used to acquire a light field file, and the light field file is decoded by a lytro powertoolbeta tool to obtain light field data Ld(u, v, s, t); light field data LdThe (u, v, s, t) is expressed by a biplane parameter method, in a four-dimensional (u, v, s, t) coordinate space, a light ray corresponds to a sampling point of a light field, the u and v planes express a space information plane, and the s and t planes express a view angle information plane; in the experiment of the invention, 640 pieces of light field data are acquired, the light field data are averagely divided into 5 pieces, 1 piece is selected as a test set in turn, and the other 4 pieces are selected as a training set. D in step 1.1 represents the training data set, D512;
step 1.2, fixing the horizontal view angle s and the vertical view angle t, and traversing the d light field data LdAll horizontal and vertical pixels in (u, v, s, t) to obtain the d-th light field data LdSub-aperture image at view angle of tth row and sth column in (u, v, s, t)And isIs marked as V and U respectively, and V is belonged to [1, V ∈],u∈[1,U]In this experiment, V375, U540;
step 1.3, traversing the light field data LdAll horizontal viewing angles and vertical viewing angles in (u, v, s, t) are obtained, and a sub-aperture image set under the d-th all viewing angles is obtainedWherein S is ∈ [1, S ∈],t∈[1,T]S represents the row of the maximum horizontal viewing angle, and T represents the column of the maximum vertical viewing angle; in particular toIn an embodiment, S-14, T-14; as shown in fig. 2, the left image in fig. 2 is a set of sub-aperture images for all viewing angles, and the right image in fig. 2 is a sub-aperture image for the viewing angle at the row 6 and column 11
Step 1.4, defining the number of the selected visual angles as m multiplied by m, and utilizing a formula (1) to collect the sub-aperture images N under the d-th all visual anglesdTo select the d-th image set M centered on the central view angled(ii) a In specific implementation, m is 9, and 81 view images are selected in total; experiments show that more visual angles can provide more information, the performance of the obvious target detection model can be further improved, however, more visual angles consume a large amount of storage and calculation time, and the experiment difficulty is increased;
step 1.5, obtaining the d-th microlens image I according to the x ═ v-1 × m + t, y ═ u-1 × m + sdMiddle x row and y column pixel point Id(x, y) to obtain the d-th microlens image I with height and width of H and W, respectivelydAs shown in FIG. 3, where x ∈ [1, H ]],y∈[1,W]H ═ V × m, W ═ U × m; in this embodiment, H is 3375, W is 4860, and the left image in fig. 3 is a microlens image IdAnd the right image in FIG. 3 is a microlens image IdAnd (3) partially enlarging, wherein all pixels in the grids in the partially enlarged image represent a pixel set of the same spatial information and different viewing angle information.
Step 3, for the d-th microlens image IdCarrying out data enhancement processing to obtain a d enhanced microlens image set Id'; for the d-th real saliency map GdPerforming geometric transformation to obtain the d-th transformed real saliency map set Gd'; in the present embodiment, for the d-th microlens image IdThe method has the advantages that the data enhancement is realized by rotating, turning over, increasing the chroma, increasing the contrast, increasing the brightness, reducing the brightness and increasing the Gaussian noise processing, and the generalization capability of the obvious target detection model can be improved by the data enhancement.
And 4, repeating the steps 1.2 to 3 to obtain D enhanced microlens image sets I' in the light field data set L (I)1′,I2′,…,Id′,…,I′D) And D sets of transformed true saliency maps are denoted G' ═ (G)1′,G2′,…,Gd′,…,G′D);
Step 5, constructing the d light field data Ld(u, v, s, t) salient object detection model;
step 5.1, a deep convolutional neural network is adopted in the deep convolutional neural network, the deep convolutional neural network is composed of 16 convolutional layers, 5 pooling layers, 2 discarding layers and 1 laminating merging layer and used for semantic segmentation, the detailed structure of the deep convolutional neural network is shown in fig. 6, the deep convolutional neural network is contained in the deep convolutional neural network, the context of an image is captured in multiple proportions, and the detection of significant targets in multiple scales is achieved.
Step 5.2, modifying the Deeplab-V2 convolutional neural network at the layer c to obtain a modified LFnet convolutional neural network, wherein the detailed structure of the LFnet convolutional neural network is shown in FIG. 5;
step 5.2.1, adding a convolution layer LF _ conv1_1 with convolution kernel size of m multiplied by m and a ReLU activation function LF _ ReLU1_1 before the first layer of the Deeplab-V2 convolutional neural network;
setting the moving step size of a convolution kernel to be m when the convolution layer LF _ conv1_1 carries out convolution operation; in specific implementation, m is 9; constructing microlens image I at step 1.4 and step 1.5dIn the process, the number of the selected viewing angles is 9 × 9, and in order that the network can better extract and fuse multi-viewing angle information, the size of a convolution kernel of the convolution layer LF _ conv1_1 is set to be 9 × 9, and the step length is 9;
the mathematical expression for the ReLU activation function LF _ ReLU1_1 is phi (a) ═ max (0, a), where a represents the output of the convolutional layer LF _ conv1_1 and is input to the ReLU activation function LF _ ReLU1_1, and phi (a) represents the output of the ReLU activation function LF _ ReLU1_ 1;
step 5.2.2, adding a discarding layer after other convolution layers in the Deeplab-V2 convolutional neural network except the convolution layers connected with the discarding layer in the convolutional layer LF _ conv1_1 and the Deeplab-V2 convolutional neural network; in the embodiment, the discarding layer is added, so that overfitting can be effectively prevented, and meanwhile, the generalization capability of the obvious target detection model is improved;
step 5.2.3, setting the number of output channels of the c-1 layer in the Deeplab-V2 convolutional neural network as b, wherein b is the number of pixel categories; in specific embodiments, c-1 ═ 23, b ═ 2; the salient object detection model classifies pixels into salient and non-salient types.
Step 5.2.4, adding an upsampling layer after the layer c of the Deeplab-V2 convolutional neural network, and utilizing the upsampling layer to output a characteristic diagram F of the layer c of the Deeplab-V2 convolutional neural networkd(q, r, b) performing an upsampling operation to obtain an upsampledLater feature map Fd' (q, r, b); wherein q, r and b respectively represent a characteristic diagram FdWidth, height and number of channels of (q, r, b);
step 5.2.5, adding a shear layer after the upper sampling layer, and according to the d-th real saliency map GdLength V and width U of (d), using shear layer pair profile Fd' (q, r, b) obtaining a microlens image I by shearingdPixel class prediction probability map Fd″(q,r,b);
And 5.3, taking the enhanced microlens image set I 'as the input of the LFnet convolutional neural network, taking the transformed real significant image set G' as a label, using a cross entropy loss function, and training the LFnet convolutional neural network by using a gradient descent algorithm, so as to obtain a significant target detection model of the light field data, and realizing significant target detection of the light field data by using the significant target detection model.
Processing the test set according to the steps 1.1 to 2 to obtain a microlens image of the test set, inputting the microlens image of the test set into the salient object detection model to obtain a pixel class prediction probability map F of the test settest"(q, r, b), extraction of saliency map F using equation (2)s", formula (2) wherein Ftest"(q, r,2) represents a probability map Ftest"(q, r, b) the value of the second channel; for significant picture Fs"normalization" to obtain the final saliency map Fs。
Fs″=Ftest″(q,r,2) (2)
In order to more fairly evaluate the performance of the significant target detection model obtained by the method, a training set and a test set are selected in turn, and the average of the 5 test results is taken as a final index for evaluating the performance of the significant target detection model.
Fig. 7 is a qualitative comparison between the significant target detection method based on the deep convolutional network of the present invention and other current light field significant target detection methods, where Ours represents the significant target detection method based on the deep convolutional network of the present invention; multi-cue represents a light field significant target detection method based on focus flow, view flow, depth and color; DILF represents a light field significant target detection method based on color, depth and background prior; WSC represents a light field significant target detection method based on sparse coding theory; LFS represents a salient object detection method based on object and background modeling. All 4 methods were tested on real scene data sets collected by the second generation light field camera used in the present invention.
Table 1 is an analysis table of quantitative comparison between the method for detecting a significant target based on a deep convolutional network and other current methods for detecting a significant target of a light field by using an 'F-measure', 'WF-measure', 'average precision AP', 'average absolute value error MAE' as measurement standards and using a data set acquired by a second-generation light field camera, wherein the 'F-measure' is a statistical index of 'recall ratio/precision curve' measurement, the closer the value to 1, the better the effect of the significant target detection is indicated, the 'WF-measure' is a statistical index of 'weighted recall ratio/precision curve' measurement, the closer the value to 1, the better the effect of the significant target detection is indicated, the 'AP' measures the average precision of the result of the significant target detection, the closer the value to 1, the better the effect of the significant target detection is indicated, and the 'MAE' measures the average absolute difference degree of the result of the significant target detection and the real result, the closer the value is to 0, the better the detection of a significant target.
Fig. 8 is an analysis diagram of the significant target detection method based on the deep convolutional network, which takes the "PR curve of the accuracy-recall rate" as the measurement standard and performs quantitative comparison with other current light field significant target detection methods, wherein if one PR curve is completely "wrapped" by another PR curve, the performance of the latter PR curve is better than that of the former PR curve.
TABLE 1
Salient object detection method | Ours | Multi-cue | DILF | WSC | LFS |
F-measure | 0.8118 | 0.6649 | 0.6395 | 0.6452 | 0.6108 |
WF-measure | 0.7541 | 0.5420 | 0.4844 | 0.5946 | 0.3597 |
AP | 0.9124 | 0.6593 | 0.6922 | 0.5960 | 0.6193 |
MAE | 0.0551 | 0.1198 | 0.1390 | 0.1093 | 0.1698 |
As can be seen from the quantitative analysis table in Table 1, the 'F-measure', 'WF-measure', 'AP' and 'MAE' obtained by the method are all higher than those obtained by other light field significant target detection methods. As can be seen from the PR graph of FIG. 8, the method of the present invention shows that the "recall/precision curve" is close to the upper right corner, and all contain PR curves of other methods, and when the recall ratios are the same, the probability of false detection is lower.
Claims (1)
1. A light field salient object detection method based on a deep convolutional network is characterized by comprising the following steps:
step 1, obtaining a microlens image Id;
Step 1.1, acquiring a light field file by using a light field device, and decoding to obtain a light field data set, which is recorded as L ═ (L ═ L)1,L2,…,Ld,…,LD) Wherein L isdRepresenting the d-th light field data and denoting the d-th light field data as Ld(u, v, s, t), u and v representing any horizontal pixel and vertical pixel in the spatial information, and s and t representing any horizontal viewing angle and vertical viewing angle in the viewing angle information; d is equal to [1, D ∈]And D represents the total number of light field data;
step 1.2, fixing a horizontal view angle s and a vertical view angle t, and traversing the d-th light field data Ld(u, v, s, t) of all horizontal and vertical pixels, resulting in the d-th light-field data LdSub-aperture image at view angle of s-th row and t-th column in (u, v, s, t)And isIs marked as V and U respectively, and V is belonged to [1, V ∈],u∈[1,U];
Step 1.3, traversing the d light field data LdAll horizontal viewing angles and vertical viewing angles in (u, v, s, t) are obtained, and a sub-aperture image set under the d-th all viewing angles is obtainedWherein S is ∈ [1, S ∈],t∈[1,T]S represents the row of the maximum horizontal viewing angle, and T represents the column of the maximum vertical viewing angle;
step 1.4, defining the number of the selected visual angles as m multiplied by m, and utilizing a formula (1) to collect the sub-aperture images N under the d-th all visual anglesdTo select the d-th image set M centered on the central view angled:
step 1.5, obtaining the d-th microlens image I according to the x ═ v-1 × m + t, y ═ u-1 × m + sdMiddle x row and y column pixel point Id(x, y) to obtain the d-th microlens image I with height and width of H and W, respectivelydWherein x ∈ [1, H ]],y∈[1,W],H=V×m,W=U×m;
Step 2, from the d image set MdSelecting the sub-aperture image of the d-th central view, and recording asSub-aperture image to the d-th central viewMarking a significant area, setting the pixel of the significant area as 1, and setting the pixel of the non-significant area as 0, thereby obtaining the d-th microlens image IdD true saliency map G ofdSaid d-th real saliency map GdAre respectively V and U;
step 3, aiming at the d-th microlens image IdNumber of advancesObtaining a d enhanced microlens image set I 'according to the enhancement processing'd(ii) a For the d real significant map GdPerforming geometric transformation processing to obtain a d-th transformed real saliency map set G'd;
And 4, repeating the steps 1.2 to 3 to obtain D enhanced microlens image sets I ' (I ') in the light field data set L '1,I′2,…,I′d,…,I′D) And D sets of true saliency maps after transformation are denoted as G '═ G'1,G′2,…,G′d,…,G′D);
Step 5, constructing the d light field data Ld(u, v, s, t) salient object detection model;
step 5.1, acquiring a Deeplab-V2 convolutional neural network of a layer c, wherein the Deeplab-V2 convolutional neural network comprises a convolutional layer, a pooling layer and a discarding layer;
step 5.2, modifying the Deeplab-V2 convolutional neural network of the layer c to obtain a modified LFnet convolutional neural network;
step 5.2.1, adding a convolution layer LF _ conv1_1 with convolution kernel size of m multiplied by m and a ReLU activation function LF _ ReLU1_1 before the first layer of the Deeplab-V2 convolutional neural network;
setting the moving step length of the convolution kernel to be m when the convolution layer LF _ conv1_1 carries out convolution operation;
the mathematical expression of the ReLU activation function LF _ ReLU1_1 is phi (a) ═ max (0, a), where a represents the output of the convolutional layer LF _ conv1_1 and is input to the ReLU activation function LF _ ReLU1_1, and phi (a) represents the output of the ReLU activation function LF _ ReLU1_ 1;
step 5.2.2, adding a discarding layer after other convolution layers in the Deeplab-V2 convolutional neural network except convolution layers connected with the discarding layer in the convolutional layer LF _ conv1_1 and the Deeplab-V2 convolutional neural network;
step 5.2.3, setting the number of output channels of the c-1 layer in the Deeplab-V2 convolutional neural network as b, wherein b is the number of pixel categories;
step 5.2.4, convolution at the deep-V2Adding an upsampling layer behind the layer c of the neural network, and utilizing the upsampling layer to output a feature map F of the layer c of the Deeplab-V2 convolutional neural networkd(q, r, b) performing an upsampling operation to obtain an upsampled feature map Fd' (q, r, b); wherein q, r and b represent the characteristic diagram F respectivelydWidth, height and number of channels of (q, r, b);
step 5.2.5, adding a shear layer after the upsampling layer, and according to the d-th real saliency map GdLength V and width U of said feature map F using said shear layerd' (q, r, b) obtaining said microlens image I by shearingdPixel class prediction probability map Fd″(q,r,b);
And 5.3, taking the enhanced microlens image set I 'as the input of the LFnet convolutional neural network, taking the transformed real significant image set G' as a label, using a cross entropy loss function, and training the LFnet convolutional neural network by using a gradient descent algorithm, so as to obtain a significant target detection model of the light field data, and realizing significant target detection of the light field data by using the significant target detection model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811141315.2A CN109344818B (en) | 2018-09-28 | 2018-09-28 | A salient object detection method in light field based on deep convolutional network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811141315.2A CN109344818B (en) | 2018-09-28 | 2018-09-28 | A salient object detection method in light field based on deep convolutional network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109344818A CN109344818A (en) | 2019-02-15 |
CN109344818B true CN109344818B (en) | 2020-04-14 |
Family
ID=65307539
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811141315.2A Expired - Fee Related CN109344818B (en) | 2018-09-28 | 2018-09-28 | A salient object detection method in light field based on deep convolutional network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109344818B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111967288A (en) * | 2019-05-20 | 2020-11-20 | 万维数码智能有限公司 | Intelligent three-dimensional object identification and positioning system and method |
CN110441271B (en) * | 2019-07-15 | 2020-08-28 | 清华大学 | Light field high-resolution deconvolution method and system based on convolutional neural network |
CN111369522B (en) * | 2020-03-02 | 2022-03-15 | 合肥工业大学 | Light field significance target detection method based on generation of deconvolution neural network |
CN111445465B (en) * | 2020-03-31 | 2023-06-16 | 江南大学 | Method and equipment for detecting and removing snow or rain belt of light field image based on deep learning |
CN111931793B (en) * | 2020-08-17 | 2024-04-12 | 湖南城市学院 | Method and system for extracting saliency target |
CN113343822B (en) * | 2021-05-31 | 2022-08-19 | 合肥工业大学 | Light field saliency target detection method based on 3D convolution |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105701813A (en) * | 2016-01-11 | 2016-06-22 | 深圳市未来媒体技术研究院 | Significance detection method of light field image |
WO2018072858A1 (en) * | 2016-10-18 | 2018-04-26 | Photonic Sensors & Algorithms, S.L. | Device and method for obtaining distance information from views |
CN107993260A (en) * | 2017-12-14 | 2018-05-04 | 浙江工商大学 | A kind of light field image depth estimation method based on mixed type convolutional neural networks |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160203689A1 (en) * | 2015-01-08 | 2016-07-14 | Kenneth J. Hintz | Object Displacement Detector |
CN105913070B (en) * | 2016-04-29 | 2019-04-23 | 合肥工业大学 | A multi-cue saliency extraction method based on light field camera |
CN106981080A (en) * | 2017-02-24 | 2017-07-25 | 东华大学 | Night unmanned vehicle scene depth method of estimation based on infrared image and radar data |
-
2018
- 2018-09-28 CN CN201811141315.2A patent/CN109344818B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105701813A (en) * | 2016-01-11 | 2016-06-22 | 深圳市未来媒体技术研究院 | Significance detection method of light field image |
WO2018072858A1 (en) * | 2016-10-18 | 2018-04-26 | Photonic Sensors & Algorithms, S.L. | Device and method for obtaining distance information from views |
CN107993260A (en) * | 2017-12-14 | 2018-05-04 | 浙江工商大学 | A kind of light field image depth estimation method based on mixed type convolutional neural networks |
Non-Patent Citations (4)
Title |
---|
Occlusion-aware depth estimation for light field using multi-orientation EPIs;Hao Sheng等;《Pattern Recognition》;20180228;第74卷;第587-599页 * |
Saliency Detection on Light Field: A Multi-Cue Approach;Jun Zhang等;《ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)》;20170831;第13卷(第3期);第32:1-32:19页 * |
光场相机的标定方法及深度估计研究;王丽娟;《万方数据知识服务平台》;20180731;第1-49页 * |
基于卷积神经网络的光场图像深度估计技术研究;罗姚翔;《万方数据知识服务平台》;20180830;第1-50页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109344818A (en) | 2019-02-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109344818B (en) | A salient object detection method in light field based on deep convolutional network | |
CN107194872B (en) | Super-resolution reconstruction method of remote sensing images based on content-aware deep learning network | |
Qi et al. | Volumetric and multi-view cnns for object classification on 3d data | |
WO2018023734A1 (en) | Significance testing method for 3d image | |
Feng et al. | Benchmark data set and method for depth estimation from light field images | |
CN108596108B (en) | Aerial remote sensing image change detection method based on triple semantic relation learning | |
CN113343822B (en) | Light field saliency target detection method based on 3D convolution | |
CN110910437B (en) | A Depth Prediction Method for Complex Indoor Scenes | |
CN117409192B (en) | A data-enhanced infrared small target detection method and device | |
CN110827312B (en) | Learning method based on cooperative visual attention neural network | |
CN104850850A (en) | Binocular stereoscopic vision image feature extraction method combining shape and color | |
CN115410074B (en) | Remote sensing image cloud detection method and device | |
CN105913070A (en) | Multi-thread significance method based on light field camera | |
CN113436210A (en) | Road image segmentation method fusing context progressive sampling | |
CN112329662B (en) | Multi-view saliency estimation method based on unsupervised learning | |
CN115439926A (en) | Small sample abnormal behavior identification method based on key region and scene depth | |
CN114187520A (en) | Building extraction model and application method thereof | |
CN104463962B (en) | Three-dimensional scene reconstruction method based on GPS information video | |
CN116977895A (en) | Stain detection method and device for universal camera lens and computer equipment | |
CN107392211B (en) | Salient target detection method based on visual sparse cognition | |
Babu et al. | An efficient image dahazing using Googlenet based convolution neural networks | |
CN111680577A (en) | Face detection method and device | |
CN113569684B (en) | Short video scene classification method, system, electronic equipment and storage medium | |
Khoshboresh-Masouleh et al. | Robust building footprint extraction from big multi-sensor data using deep competition network | |
CN114926826A (en) | Scene text detection system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200414 |