WO2021088300A1 - Procédé de détection de personnel par fusion multimodale rvb-d reposant sur un réseau asymétrique à double flux - Google Patents
Procédé de détection de personnel par fusion multimodale rvb-d reposant sur un réseau asymétrique à double flux Download PDFInfo
- Publication number
- WO2021088300A1 WO2021088300A1 PCT/CN2020/080991 CN2020080991W WO2021088300A1 WO 2021088300 A1 WO2021088300 A1 WO 2021088300A1 CN 2020080991 W CN2020080991 W CN 2020080991W WO 2021088300 A1 WO2021088300 A1 WO 2021088300A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- rgb
- depth
- image
- feature
- prediction
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
Definitions
- the invention belongs to the field of computer vision and image processing, and in particular relates to an RGB-D multimodal fusion personnel detection method based on an asymmetric dual-stream network.
- personnel detection based on RGB images there are two main methods of personnel detection: personnel detection based on RGB images and personnel detection based on multi-modal image fusion.
- the person detection method based on RGB image is to detect persons only under the RGB image.
- the typical person methods include the person detection method based on RGB face and the person detection method based on RGB whole body.
- the RGB face-based person detection method extracts the general feature representation of the face by calibrating the key points of the face and encoding the features of the face under the RGB image only, and adopts machine learning or deep learning.
- the method trains the face detection model, and selects and locates the face area of the person in the test sample image through the circumscribed rectangular frame predicted by the model, so as to achieve the purpose of person detection.
- the RGB whole-body-based person detection method is different from the face detection.
- the method is to extract the image area containing the whole body of the person or the main body parts with recognizability only under the RGB image for feature representation, and train the person detection based on the whole body image
- the model uses the circumscribed rectangular frame predicted by the model to select and locate the whole body area of the character, so as to achieve the purpose of personnel detection.
- this method is susceptible to the limitation of the scene and the impact of image imaging resolution. Due to the optical imaging principle of the visible light camera, the RGB color image captured by the visible light camera has poor immunity to changes in lighting conditions, especially in low-illumination scenes such as night, rain, snow, and fog.
- the real-time captured image of the camera presents a dark or similar background .
- the foreground and background information that cannot be clearly distinguished from the image will greatly affect the training convergence of the detection model and reduce the accuracy of the detection of people.
- the visible light camera cannot obtain the depth information and thermal radiation information of the objects or people in the scene. Therefore, the captured two-dimensional planar image cannot effectively highlight the key information such as the edge contour and texture of the occluded target to solve the problem of human occlusion, and it is even submerged by similar background information, resulting in a significant drop in the accuracy and recall rate of human detection.
- the person detection method based on multi-modal image fusion is different from the person detection method based on RGB images.
- the input data is images from different image sources in the same detection scene, such as RGB images, depth images, and infrared thermal images. Each image source is captured by different camera equipment, and the image itself has different characteristics.
- the detection method of multi-modal image fusion mainly uses the cross fusion of images of different modalities to achieve feature enhancement and complementary association.
- the RGB color image is more robust to light changes, and can be imaged stably under low illumination conditions such as night, and because the imaging principles of the infrared thermal camera, the depth camera and the visible light camera are different, the two It can better capture auxiliary clues such as the edge contour of the person under partial occlusion, and to a certain extent can alleviate the problem of partial occlusion.
- deep learning methods are often used to realize the feature fusion and associated modeling of multi-modal information.
- the trained model is suitable for personnel detection under multi-constraint and multi-scene conditions (such as low illumination at night, severe occlusion, long-distance shooting, etc.) Better robustness.
- the existing methods for multi-modal image fusion methods mostly use traditional manual extraction of multi-modal feature fusion and use RGBT or RGBD (color image + thermal infrared image, color image + depth image) dual-stream neural network for additional four-channel fusion, single Simple fusion methods such as scale fusion and weighted decision fusion.
- the traditional manual multi-modal fusion method requires human design and extraction of multi-modal features, which relies on subjective experience and is time-consuming and laborious, and cannot achieve end-to-end personnel detection.
- the simple dual-stream neural network multi-modal fusion strategy cannot fully and effectively utilize the fine-grained information such as color and texture of the color image and the semantic information such as edge and depth provided by the depth image to realize the correlation and complementarity between multi-modal data. Even the over-fitting phenomenon occurs due to the high complexity of the model, which causes the accuracy and recall rate of personnel detection to decrease instead of increasing.
- the RGB-T personnel detection has great limitations due to the high cost of the red thermal imaging camera and the high cost in practical applications.
- the present invention provides a pedestrian detection and identity recognition method based on RGBD.
- the method includes: inputting RGB and depth images, preprocessing the images, and converting color channels; and then constructing multi-channel features of RGB and depth images. Specifically, First calculate the horizontal gradient and vertical gradient of the RGB image to construct the RGB gradient direction histogram feature, as well as the horizontal gradient, vertical gradient and depth normal vector direction of the depth image, construct the gradient direction histogram of the depth image, as the multi-channel feature of RGBD; calculate; The scale corresponding to each pixel of the depth image is quantified to obtain the scale list; according to the multi-channel features, the Adaboost algorithm is used to train the pedestrian detection classifier; the detection classifier is used to search the scale space corresponding to the scale list to obtain pedestrian information Circumscribed rectangular frame to complete pedestrian detection
- this method needs to manually extract the gradient direction histogram of the traditional RGBD image as the image feature, which is time-consuming and labor-intensive and takes up a large storage space. It cannot achieve end-to-end pedestrian detection; the gradient direction histogram feature is relatively simple, and it is difficult to extract RGB and depth. Pedestrian detection is performed on the distinguishing features in the image; this method uses the simple fusion of RGB and depth image features, and it is difficult to fully and effectively mine the fine-grained information such as color and texture of the RGB image and the semantic information such as edge and depth provided by the depth image. , To realize the correlation and complementation between multi-modal data, which has great limitations in improving the accuracy of pedestrian detection.
- the present invention provides a RGBD multi-modal fusion personnel detection method based on an asymmetric dual-stream network, but it is not limited to personnel detection, and can also be applied to tasks such as target detection and vehicle detection.
- RGBD multi-modal fusion personnel detection method based on an asymmetric dual-stream network provided by the present invention
- Figure 1 The representative diagram of a RGBD multi-modal fusion personnel detection method based on an asymmetric dual-stream network provided by the present invention is shown in Figure 1, including RGBD image acquisition, depth image preprocessing, RGB feature extraction and Depth feature extraction, and RGB multi-scale fusion Multi-scale fusion with Depth, multi-modal feature channel weighting, and multi-scale personnel prediction, the specific functions of each step are as follows:
- the original RGB image and depth image (hereinafter referred to as Depth image) are obtained by a camera with the function of shooting RGB image and depth image at the same time, and the RGB and Depth images are matched and grouped. Each group of images consists of an RGB image and the same scene.
- the captured Depth image is composed, and the grouped and matched RGB and Depth images are output.
- the original RGB image and Depth image can also be obtained from the public RGBD data set.
- RGB_FP_H, RGB_FP_M, RGB_FP_L which represent the low-level color texture, intermediate edge contour, and high-level semantic feature representation of RGB images
- D_FP_H, D_FP_M, and D_FP_L representing the low-level color texture, intermediate edge contour and high-level semantic features of the Depth image Said.
- the RGB network stream and the Depth network stream have a symmetrical structure, that is, the structure of the RGB network stream and the Depth network stream are exactly the same. However, the features contained in the Depth image are simpler than the RGB image.
- an asymmetric dual-stream convolutional neural network model is designed to extract the features of RGB image and Depth image.
- Figures 2-1 to 2-4 are a specific embodiment structure of the asymmetric dual-stream convolutional neural network model designed by the method, but are not limited to the structures shown in Figures 2-1 to 2-4.
- the DarkNet-53 described in Figure 2-1 and the MiniDepth-30 described in Figure 2-2 respectively represent the RGB network stream and the Depth network stream, and their network structures are asymmetrical.
- RGB feature maps RGB_FP_H, RGB_FP_M, RGB_FP_L input to the RGB multi-scale fusion, first expand the obtained RGB_FP_L to the same size as RGB_FP_M through the upsampling layer, and then merge the channels with RGB_FP_M to realize the advanced semantics of the RGB network deep layer
- the final output of Depth multi-scale fusion is the original input RGB_FP_L, the new feature maps RGB_FP_M and RGB_FP_H after channel merging;
- the output of Depth multi-scale fusion is the original input D_FP_L, the new feature maps D_FP_M and D_FP_H after channel merging.
- RGB_FP_L RGB-FP_M
- RGB_FP_H Depth feature maps D_FP_L, D_FP_M, D_FP_H from Depth multi-scale fusion
- D_FP_L Depth multi-scale fusion
- D_FP_L Depth multi-scale fusion
- channel merging is performed, and the feature map obtained after channel merging is marked as Concat_L; then the channel re-weighting module is applied (Hereinafter referred to as RW_Module) linearly weights the feature channels of Concat_L, assigns a weight to each feature channel, and the re-weighted feature map of the output channel is denoted as RW_L.
- RW_Module the channel re-weighting module linearly weights the feature channels of Concat_L, assigns a weight to each feature channel, and the re-weighted feature map of the output channel is denoted as RW_L.
- the channel weighting of RGB_FP_M and D_FP_M, RGB_FP_H and D_FP_H is done in the same manner as the RGB_FP_L and D_FP_L.
- the final multi-modal feature channel re-weighted output channel re-weighted low-, medium-, and high-resolution feature maps, respectively marked as RW_L,
- Each prediction point on RW_L has a large receptive field, which is used to predict a larger target in the image; each prediction point on RW_M has a medium receptive field, which is used to predict a medium target in the image; each prediction on RW_H Points have smaller receptive fields and are used to predict smaller targets in the image.
- NMS non-maximum suppression
- i represents the ID number of the person
- N is the total number of person detection results retained in the current image. They represent the abscissa of the upper left corner, the ordinate of the upper left corner, the abscissa of the lower right corner, and the ordinate of the lower right corner of the rectangular frame containing all persons.
- the present invention addresses the problem that the traditional symmetrical RGBD dual-stream network (RGB network stream + Depth network stream) is prone to loss of depth characteristics due to the excessive depth of the Depth network.
- the present invention designs an asymmetric RGBD dual-stream convolutional neural network.
- Model the Depth network stream is obtained by effectively model pruning the RGB network stream. While reducing the parameters, it can reduce the risk of model overfitting and improve the detection accuracy.
- the RGB network stream and the Depth network stream are used to extract the high, medium, and low resolution feature maps of RGB and depth images (hereinafter referred to as Depth images), respectively, representing the low-level color texture, intermediate edge contour and high-level semantics of RGB and Depth images.
- Feature representation secondly, a multi-scale fusion structure is designed for the RGB network stream and the Depth network stream to realize the high-level semantic features contained in the low-resolution feature map and the intermediate edge contour and low-level color texture features contained in the medium and high-resolution feature maps.
- the multi-scale information is complementary; then the multi-modal feature channel weighting structure is constructed, RGB and Depth feature maps are combined, and each feature channel after the combination is weighted and assigned, so that the model can automatically learn the contribution proportion, complete feature selection and remove redundancy Remaining functions, so as to realize the multi-modal feature fusion of RGB and Depth features corresponding to high, medium and low resolutions; finally, the use of multi-modal features for personnel classification and border regression, while ensuring real-time performance, improve The accuracy of people detection and the robustness of detection under low illumination at night and under people's obscuration are enhanced.
- Fig. 1 A representative diagram of a RGBD multi-modal fusion personnel detection method based on an asymmetric dual-stream network provided by the present invention
- Figure 2-1 is a structure diagram of an RGB network stream-DarkNet-53
- Figure 2-2 is a structure diagram of a Depth network stream-MiniDepth-30
- Figure 2-3 is a general structure diagram of a convolution block.
- Figure 2-4 is a general structure diagram of a residual convolution block.
- Fig. 3 is a flowchart of a method for RGBD multi-modal fusion personnel detection based on an asymmetric dual-stream network provided by an embodiment of the present invention
- Figure 4 A general structure diagram of a channel reweighting module provided by an embodiment of the present invention
- FIG. 5 A flowchart of the NMS algorithm provided by an embodiment of the present invention
- S1 Use a camera with the function of simultaneously shooting RGB images and depth images to obtain the original RGB image and the depth image, match and group the images, and output the grouped and matched RGB and Depth images.
- Step S110 Obtain the original RGB image by using a camera with the function of simultaneously shooting RGB images and depth images, and the original RGB images can also be obtained from the public RGBD data set.
- Step S120 Acquire the Depth image matching the RGB image synchronously from the step S110, and group the RGB and Depth images.
- Each group of images is composed of an RGB image and a depth image captured in the same scene, and the output group is matched Depth image.
- the original depth image obtained from the step S120 is used as input. First, part of the noise of the Depth image is eliminated, then the hole is filled, and finally the single-channel Depth image is re-encoded into a three-channel image, and the values of the three channels are renormalized to 0 -255, output the Depth image after encoding normalization.
- a 5x5 Gaussian filter is used to remove noise;
- the image repair algorithm proposed in [2] is used for hole repair, and the local normal vector and occlusion boundary in the Depth image are extracted, and then global optimization is applied to fill the hole of the Depth image;
- Depth Image coding adopts HHA coding [3] (horizontal disparity, height above ground, and the angle the pixel), and the three channels are the horizontal disparity, the height above the ground and the angle of the surface normal vector.
- RGB network stream of the asymmetric dual-stream network model adopts DarkNet-53 [4], and the network structure of DarkNet-53 is shown in Figure 2-1.
- the network contains 52 convolutional layers, among which layers L1 ⁇ L10 of the network are used to extract the general features of RGB images and output RGB_FP_C; layers L11 ⁇ L27 are used to extract low-level color texture features of RGB images and output RGB_FP_H; layers L28 ⁇ L44 Used to extract the middle-level edge contour features of RGB images, and output RGB_FP_M; L45 ⁇ L52 layers are used to extract high-level semantic features of RGB images, and output RGB_FP_L.
- the DarkNet-53 model used in this embodiment is only a specific embodiment of the RGB network flow of the asymmetric dual-stream network, and is not limited to the aforementioned DarkNet-53 model. The following only uses DarkNet-53 as an example. Discourse.
- Step S310 Obtain the original RGB image from the S110, extract the general features of the RGB image through the L1 ⁇ L10 layers of the DarkNet-53 network, and downsample the image resolution by K times, and output the RGB general feature map RGB_FP_C, whose size becomes One Kth of the original input size.
- the value of K is 8.
- Layers L1 to L10 can be divided into three sub-sampling layers, L1 to L2, L3 to L5, and L6 to L10. Each sub-sampling layer down-samples the input image resolution from the previous layer by 2 times.
- the first sub-sampling layer includes a standard convolution block with a step length of 1 (denoted as Conv0) and a pooled convolution block with a step length of 2 (denoted as Conv0_pool).
- the general structure of the convolution block is shown in the figure As shown in 2-3, it includes a standard image convolution layer, a batch normalization layer, and a Leaky_ReLU activation layer; the second sub-sampling layer includes a residual convolution block (denoted as Residual_Block_1) and one of the pooling convolution blocks (denoted as Residual_Block_1).
- Is Conv1_pool where the general structure of the residual convolution block is shown in Figure 2-4, including a 1x1xM standard convolution block, a 3x3xN standard convolution block, and an Add that transfers the identity map of the input to the output Module, M represents the number of input feature channels, N represents the number of output feature channels, where the values of M and N are respectively 32;
- the third sub-sampling layer includes 2 of the residual convolution blocks (denoted as Residual_Block_2_1 ⁇ 2_2) and One such pooled convolution block (denoted as Conv2_pool).
- the value of K is 8, and the values of M and N are shown in layers L1 to L10 in Fig. 3-1.
- Step S320 Obtain RGB_FP_C from S310, extract the low-level color texture features of the RGB image through the L11 ⁇ L27 layers of the DarkNet-53 network, and downsample the image resolution by K times, and output the RGB high-resolution feature map RGB_FP_H with its size It becomes one Kth of the original input size.
- L11 to L27 are composed of 8 residual convolution blocks (denoted as Residual_Block_3_1 to 3_8) and one pooling convolution block (Conv3_pool).
- the value of K is 2, and the values of M and N are shown in layers L11 to L27 in Figure 3-1.
- Step S330 Obtain RGB_FP_H from the S320, extract the mid-level edge contour features of the RGB image through the L28 ⁇ L44 layer of the DarkNet-53 network, and downsample the image resolution by K times, and output the RGB mid-resolution feature map RGB_FP_M with its size It becomes one Kth of the original input size.
- L28 to L44 are composed of 8 residual convolution blocks (denoted as Residual_Block_4_1 to 4_8) and one convolution block (Conv4_pool).
- the value of K is 2, and the values of M and N are shown in Layers L28 to L44 in Figure 3-1.
- Step S340 Obtain RGB_FP_M from the S320, extract the high-level semantic features of the RGB image through the L45 ⁇ L52 layers of the DarkNet-53 network, and downsample the image resolution by K times, and output the RGB low-resolution feature map RGB_FP_L, the size of which is changed It is one Kth of the original input size.
- L45 to L52 are composed of 4 of the residual convolution blocks (denoted as Residual_Block_5_1 to 5_4).
- the value of K is 2, and the values of M and N are shown in Layers L45 to L52 in Figure 3-1.
- S3' Obtain the normalized Depth image from the S2, and use the Depth network stream of the asymmetric dual-stream network model to extract the general, low-level, intermediate and high-level features of the Depth image at different network levels, and then output the corresponding general feature map
- the RGB feature maps of high, medium, and low resolutions are denoted as D_FP_C, D_FP_H, D_FP_M, D_FP_L, and D_FP_H, D_FP_M, D_FP_L are input to S4'.
- the Depth network stream of the asymmetric dual-stream network model is obtained by pruning the model on the basis of the RGB network stream DarkNet-53, which is hereinafter referred to as MiniDepth-30 for short.
- the MiniDepth-30 network can extract semantic features such as the edge contour of the depth image more effectively and clearly, and at the same time achieve the effect of reducing network parameters and preventing over-fitting.
- the network structure of MiniDepth-30 is shown in Figure 2-2.
- the network contains a total of 30 convolutional layers, among which the L1 ⁇ L10 layers of the network are used to extract the general features of the Depth image and output D_FP_C; the L11 ⁇ L17 layers are used to extract the low-level color texture features of the Depth image, and the output D_FP_H; L18 ⁇ L24 layers Used to extract the middle-level edge contour features of the Depth image, output D_FP_M; L25 ⁇ L30 layers are used to extract the high-level semantic features of the Depth image, output D_FP_L.
- MiniDepth-30 model used in this embodiment is only a specific embodiment of the Depth network flow of the asymmetric dual-stream network, and is not limited to the aforementioned MiniDepth-30 model.
- Step S310' Obtain the normalized Depth image from the S2, extract the general features of the RGB image through the L1 ⁇ L10 layers of the MiniDepth-30 network, and downsample the image resolution by K times, and output the general Depth feature map D_FP_C, Its size becomes one K of the original input size.
- the L1 to L10 network layers of MiniDepth-30 have the same structure as the L1 to L10 network layers of DarkNet-53 in step S310, and the value of K is 8.
- Step S320' Obtain D_FP_C from the step S310', extract the low-level color texture features of the Depth image through the L11 ⁇ L17 layers of the MiniDepth-30 network, and downsample the image resolution by K times, and output the Depth high-resolution feature map D_FP_H , Its size becomes one K of the original input size.
- L11 to L17 are composed of three of the residual convolution blocks (denoted as Residual_Block_D_3_1 to 3_3) and one of the pooling convolution blocks (Conv3_D_pool).
- the value of K is 2, and the values of M and N are shown in Layers L11 to L17 in Figure 3-2.
- Step S330' Obtain D_FP_H from the step S320', extract the mid-level edge contour features of the Depth image through the L18 ⁇ L24 layers of the MiniDepth-30 network, and downsample the image resolution by K times, and output the Depth mid-resolution feature map D_FP_M , Its size becomes one K of the original input size.
- L18 to L24 are composed of three of the residual convolution blocks (denoted as Residual_Block_D_4_1 to 4_3) and one of the pooling convolution blocks (Conv4_D_pool).
- the value of K is 2, and the values of M and N are shown in layers L18 to L24 in Figure 3-1.
- Step S340' Obtain D_FP_M from the step S330', extract the high-level semantic features of the Depth image through the L25 ⁇ L30 layers of the DarkNet-53 network, and downsample the image resolution by K times, and output the Depth low-resolution feature map D_FP_L, Its size becomes one K of the original input size.
- L25 to L30 are composed of three of the residual convolution blocks (denoted as Residual_Block_D_5_1 to 5_3).
- the value of K is 2, and the values of M and N are shown in Layers L25 to L30 in Figure 3-1.
- S4 Obtain RGB_FP_H, RGB_FP_M and RGB_FP_L from the S3, use upsampling to expand the feature map size, merge the feature channels of the RGB feature maps with the same resolution to achieve feature fusion, and output the feature maps RGB_FP_H, RGB_FP_M and RGB_FP_L after feature fusion to S5.
- Step S410 From the RGB_FP_L obtained in the step S340, it is up-sampled by M times and then merged with the RGB_FP_M obtained in the step S330 to realize the complementary fusion of the high-level semantic features of the deep layer of the RGB network and the middle-level edge contour feature of the middle layer, and output The new feature map RGB_FP_M after feature fusion.
- the specific method of channel merging the number of channels of RGB_FP_L is C1, the number of channels of RGB_FP_M is C2, the two channels are merged C1+C2 to obtain C3, and C3 is the number of channels of the new feature map RGB_FP_M after feature fusion.
- the value of M is 2, and the values of C1, C2, and C3 are 256, 512, and 768, respectively.
- Step S420 Obtain the new feature map RGB_FP_M after the feature fusion from the step S410, and perform channel merging with the RGB_FP_H obtained in the step S320 after up-sampling to realize the deep high-level semantic features of the RGB network and the middle-level edge contour of the middle layer The complementary fusion of features and shallow low-level color texture features, and output the new feature map D_FP_H after feature fusion.
- the specific method of channel merging the number of channels of RGB_FP_M is C1, the number of channels of RGB_FP_H is C2, the two channels are merged C1+C2 to obtain C3, and C3 is the number of channels of the new feature map RGB_FP_H after feature fusion.
- the value of M is 2, and the values of C1, C2, and C3 are 128, 256, and 384, respectively.
- S4' Obtain D_FP_H, D_FP_M, and D_FP_L from the S3', use upsampling to expand the size of the feature map, merge the feature channels of the Depth feature maps with the same resolution to achieve feature fusion, and output the feature maps D_FP_H, D_FP_M, D_FP_M, and D_FP_M after feature fusion.
- D_FP_L to S5.
- Step S410' From the D_FP_L obtained in the step S340', it is up-sampled by M times and then merged with the D_FP_M obtained in the step S330' to realize the complementarity of the deep high-level semantic features of the Depth network and the middle-level edge contour features of the middle layer Fusion, output the new feature map D_FP_M after feature fusion.
- the specific method of channel merging the number of channels of D_FP_L is C1, the number of channels of D_FP_M is C2, the two channels are merged C1+C2 to obtain C3, and C3 is the number of channels of the new feature map D_FP_M after feature fusion.
- the value of M is 2, and the values of C1, C2, and C3 are 256, 512, and 768, respectively.
- Step S420' Obtain the new feature map D_FP_M after the feature fusion from the step S410, and perform channel merging with the D_FP_H obtained in the step S320' after up-sampling to realize the deep high-level semantic features of the Depth network and the intermediate level of the middle layer.
- the complementary fusion of edge contour features and shallow low-level color texture features will output the new feature map D_FP_H after feature fusion.
- the specific method of channel merging the number of channels of D_FP_M is C1, the number of channels of D_FP_H is C2, the two channels are merged C1+C2 to obtain C3, and C3 is the number of channels of the new feature map D_FP_H after feature fusion.
- the value of M is 2, and the values of C1, C2, and C3 are 128, 256, and 384, respectively.
- S5 Obtain new feature maps RGB_FP_H, RGB_FP_M, and RGB_FP_L after feature fusion from S4, and obtain new feature maps D_FP_H, D_FP_M, D_FP_L after feature fusion from S4', and perform feature channel merging at corresponding equal resolutions to obtain channels
- the merged feature maps are marked as Concat_L, Concat_M, and Concat_H respectively, and then the channel weighting module (hereinafter referred to as RW_Module) is applied to linearly weight Concat_L, Concat_M, and Concat_H respectively, and the high, medium, and low resolutions after the channel weighting are output Rate characteristic map, respectively denoted as RW_H, RW_M, RW_L.
- Step S510 Obtain RGB_FP_L and D_FP_L from the S4, first combine the feature channels of RGB_FP_L and D_FP_L to obtain Concat_L, realize the complementary fusion of RGB and Depth in the network deep multi-modal information, and then apply the channel weighting module RW_Module to Concat_L Linear weighting, assign weight to each feature channel, and output the re-weighted feature map RW_L of the channel.
- the channel weighting of RGB_FP_L and D_FP_L as an example, the general structure of a channel weighting module provided in this embodiment is shown in FIG. 4.
- the number of channels of RGB_FP_L is C1
- the number of channels of D_FP_L is C2
- the Concat_L passes through 1 1x1 Ave in turn -Pooling layer, 1 standard convolution layer composed of C3/s (s is the reduction step size) 1x1 convolution kernel, 1 C3 standard convolution layer composed of 1x1 convolution kernel, and 1 Sigmoid layer to obtain C3 weight values ranging from 0 to 1; finally, the obtained C3 weight values are multiplied by the C3 feature channels of the Concat_L, and each feature channel is assigned a weight, and the weighted C3 channels are output.
- the characteristic channel namely RW_L.
- the values of C1, C2, and C3 are 1024, 1024, and 2048 respectively
- the value of the reduction step size s is 16 respectively.
- Step S520 Obtain RGB_FP_M from step S410 and D_FP_M from step S410', first combine the characteristic channels of RGB_FP_M and D_FP_M to obtain Concat_M, to achieve the complementary fusion of RGB and Depth multi-modal information in the middle layer of the network, and then apply The channel re-weighting module RW_Module performs linear weighting on Concat_M, assigns a weight to each feature channel, and outputs the channel re-weighted feature map RW_M.
- the channel weighting method of RGB_FP_M and D_FP_M is consistent with the channel weighting method of RGB_FP_L and D_FP_L in step S510, where the values of C1, C2, and C3 are 512, 512, 1024, respectively, and the step size is reduced.
- the values of are 16 respectively.
- Step S530 Obtain RGB_FP_H from the step S420 and D_FP_H from the step S420', first combine the feature channels of RGB_FP_H and D_FP_H to obtain Concat_H, realize the complementary fusion of RGB and Depth in the network shallow multi-modal information, and then apply The channel re-weighting module RW_Module performs linear weighting on Concat_H, assigns a weight to each feature channel, and outputs the channel re-weighted feature map RW_H.
- the channel weighting method of RGB_FP_H and D_FP_H is consistent with the channel weighting method of RGB_FP_L and D_FP_L in step S510, where the values of C1, C2, and C3 are 256, 256, 512, respectively, which reduces the step size s.
- the values are 16 respectively.
- S6 Obtain the channel-weighted feature maps RW_L, RW_M, RW_H from the S5, and perform classification and frame coordinate regression respectively to obtain prediction results for persons with larger, medium and smaller sizes, and predict the above three different scales
- NMS non-maximum suppression
- i represents the ID number of the person
- N is the total number of person detection results retained in the current image. They represent the abscissa of the upper left corner, the ordinate of the upper left corner, the abscissa of the lower right corner, and the ordinate of the lower right corner of the rectangular frame containing all persons.
- Step S610 Obtain the channel-weighted low-resolution feature map RW_L from the step S510, transmit it to the SoftMax classification layer and the coordinate regression layer, and output the category confidence score for predicting larger-size persons under the low-resolution feature map And the coordinates of the upper left and lower right corners of the rectangular border
- the subscript L represents the prediction result under the low-resolution feature map.
- Step S620 Obtain the channel-weighted low-resolution feature map RW_M from the step S520, transmit it to the SoftMax classification layer and the coordinate regression layer, and output the category confidence score for predicting medium-sized persons under the medium-resolution feature map And the coordinates of the upper left and lower right corners of the rectangular border
- the subscript M represents the prediction result under the medium-resolution feature map.
- Step S630 Obtain the channel-weighted high-resolution feature map RW_H from the step S530, transmit it to the SoftMax classification layer and the coordinate regression layer, and output the category confidence score for predicting the smaller-sized person under the high-resolution feature map And the coordinates of the upper left and lower right corners of the rectangular border
- the subscript H represents the prediction result under the high-resolution feature map.
- Step S640 Obtain the category confidence scores of persons of larger, medium and smaller sizes from the steps S610, S620 and S630 And the upper left and lower right coordinates of the rectangular border
- the flow chart of the NMS algorithm is shown in Figure 5.
- Step S640-1 Obtain the confidence scores of the categories of persons of larger, medium, and smaller sizes from the steps S610, S620, and S630 And the upper left and lower right coordinates of the rectangular border Summarize the prediction results of the three scales, use the confidence threshold to filter the prediction boxes, keep the prediction boxes with category confidence scores greater than the confidence threshold, and add them to the prediction list.
- the confidence threshold is set to 0.3.
- Step S640-2 From the prediction list obtained in step S640-1, the unprocessed prediction frames in the prediction list are sorted in descending order of confidence scores, and the prediction list sorted in descending order is output.
- Step S640-3 Obtain the prediction list in descending order from the step S640-2, select the frame corresponding to the maximum confidence score as the current reference frame, and add the category confidence score and frame coordinates of the current reference frame to the final result list , And remove the reference frame from the prediction list, and calculate the intersection ratio (IoU) for all other predicted frames and the current reference frame.
- Step S640-4 Obtain the prediction list and the IoU values of all the frames in the prediction list and the reference frame from the step S640-3. If the IoU of the current frame is greater than the preset NMS threshold, it is considered to be a duplicate target with the reference frame, and It is removed from the list of predicted bounding boxes, otherwise the current bounding box is kept. Output the filtered prediction list.
- Step S640-5 Obtain the filtered prediction list from the step S640-4. If all frames in the prediction list are processed, that is, the prediction frame is empty, the algorithm ends and the final result list is returned; otherwise, the current prediction list is still If there is an unprocessed frame, return to step S640-2 to repeat the algorithm flow.
- Step S640-6 For the step S640-5, when there is no unprocessed prediction frame in the prediction list, output the final result list as the final retained personnel detection result.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
La présente invention se rapporte au domaine de la vision par ordinateur et du traitement d'images, et concerne un procédé de détection de personnel par fusion multimodale RVB-D reposant sur un réseau asymétrique à double flux. Le procédé comprend les étapes de collecte d'image RVB-D, de pré-traitement d'image de profondeur, d'extraction de caractéristique RVB et d'extraction de caractéristique de profondeur, de fusion multi-échelle RVB et de fusion multi-échelle de profondeur, de repondération de canal de caractéristique multimodale, et de prédiction de personnel multi-échelle. Selon la présente invention, un modèle de réseau neuronal à convolution asymétrique à double flux RVB-D est conçu pour résoudre le problème que représente le fait que les réseaux symétriques à double flux RVB-D classiques sont susceptibles d'engendrer une perte de caractéristique de profondeur. Des structures de fusion multi-échelle sont respectivement conçues pour des réseaux à double flux RVB-D, de telle sorte qu'une complémentation d'informations multi-échelle est obtenue. Une structure de repondération multimodale est construite, des cartes de caractéristiques RVB et de profondeur sont combinées, et une attribution pondérée est effectuée sur chaque canal de caractéristique combiné de façon à implémenter une proportion de contribution d'apprentissage automatique de modèle. La classification de personnel et la régression de la trame sont réalisées à l'aide des caractéristiques multimodales, de telle sorte que la précision de la détection de personnel est améliorée tout en garantissant l'efficacité en temps réel, et la fiabilité de la détection sous un faible éclairage de nuit et de la protection du personnel est améliorée.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911090619.5A CN110956094B (zh) | 2019-11-09 | 2019-11-09 | 一种基于非对称双流网络的rgb-d多模态融合人员检测方法 |
CN201911090619.5 | 2019-11-09 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021088300A1 true WO2021088300A1 (fr) | 2021-05-14 |
Family
ID=69977120
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/080991 WO2021088300A1 (fr) | 2019-11-09 | 2020-03-25 | Procédé de détection de personnel par fusion multimodale rvb-d reposant sur un réseau asymétrique à double flux |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110956094B (fr) |
WO (1) | WO2021088300A1 (fr) |
Cited By (51)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113298094A (zh) * | 2021-06-10 | 2021-08-24 | 安徽大学 | 一种基于模态关联与双感知解码器的rgb-t的显著性目标检测方法 |
CN113313688A (zh) * | 2021-05-28 | 2021-08-27 | 武汉乾峯智能科技有限公司 | 一种含能材料药桶识别方法、系统、电子设备及存储介质 |
CN113361466A (zh) * | 2021-06-30 | 2021-09-07 | 江南大学 | 一种基于多模态交叉指导学习的多光谱目标检测方法 |
CN113362224A (zh) * | 2021-05-31 | 2021-09-07 | 维沃移动通信有限公司 | 图像处理方法、装置、电子设备及可读存储介质 |
CN113468954A (zh) * | 2021-05-20 | 2021-10-01 | 西安电子科技大学 | 基于多通道下局部区域特征的人脸伪造检测方法 |
CN113486781A (zh) * | 2021-07-02 | 2021-10-08 | 国网电力科学研究院有限公司 | 一种基于深度学习模型的电力巡检方法及装置 |
CN113537326A (zh) * | 2021-07-06 | 2021-10-22 | 安徽大学 | 一种rgb-d图像显著目标检测方法 |
CN113538615A (zh) * | 2021-06-29 | 2021-10-22 | 中国海洋大学 | 基于双流生成器深度卷积对抗生成网络的遥感图像上色方法 |
CN113569723A (zh) * | 2021-07-27 | 2021-10-29 | 北京京东尚科信息技术有限公司 | 一种人脸检测方法、装置、电子设备及存储介质 |
CN113657521A (zh) * | 2021-08-23 | 2021-11-16 | 天津大学 | 一种分离图像中两种互斥成分的方法 |
CN113658134A (zh) * | 2021-08-13 | 2021-11-16 | 安徽大学 | 一种多模态对齐校准的rgb-d图像显著目标检测方法 |
CN113848234A (zh) * | 2021-09-16 | 2021-12-28 | 南京航空航天大学 | 一种基于多模态信息的航空复合材料的检测方法 |
CN113902783A (zh) * | 2021-11-19 | 2022-01-07 | 东北大学 | 一种融合三模态图像的显著性目标检测系统及方法 |
CN113989245A (zh) * | 2021-10-28 | 2022-01-28 | 杭州中科睿鉴科技有限公司 | 多视角多尺度图像篡改检测方法 |
CN114037938A (zh) * | 2021-11-09 | 2022-02-11 | 桂林电子科技大学 | 一种基于NFL-Net的低照度目标检测方法 |
CN114049508A (zh) * | 2022-01-12 | 2022-02-15 | 成都无糖信息技术有限公司 | 一种基于图片聚类和人工研判的诈骗网站识别方法及系统 |
CN114119965A (zh) * | 2021-11-30 | 2022-03-01 | 齐鲁工业大学 | 一种道路目标检测方法及系统 |
CN114170174A (zh) * | 2021-12-02 | 2022-03-11 | 沈阳工业大学 | 基于RGB-D图像的CLANet钢轨表面缺陷检测系统及方法 |
CN114202663A (zh) * | 2021-12-03 | 2022-03-18 | 大连理工大学宁波研究院 | 一种基于彩色图像和深度图像的显著度检测方法 |
CN114202646A (zh) * | 2021-11-26 | 2022-03-18 | 深圳市朗驰欣创科技股份有限公司 | 一种基于深度学习的红外图像吸烟检测方法与系统 |
CN114219807A (zh) * | 2022-02-22 | 2022-03-22 | 成都爱迦飞诗特科技有限公司 | 乳腺超声检查图像分级方法、装置、设备和存储介质 |
CN114359228A (zh) * | 2022-01-06 | 2022-04-15 | 深圳思谋信息科技有限公司 | 物体表面缺陷检测方法、装置、计算机设备和存储介质 |
CN114372986A (zh) * | 2021-12-30 | 2022-04-19 | 深圳大学 | 注意力引导多模态特征融合的图像语义分割方法及装置 |
CN114445442A (zh) * | 2022-01-28 | 2022-05-06 | 杭州电子科技大学 | 基于非对称交叉融合的多光谱图像语义分割方法 |
CN114663436A (zh) * | 2022-05-25 | 2022-06-24 | 南京航空航天大学 | 一种基于深度学习的跨尺度缺陷检测方法 |
CN114708295A (zh) * | 2022-04-02 | 2022-07-05 | 华南理工大学 | 一种基于Transformer的物流包裹分离方法 |
CN114821488A (zh) * | 2022-06-30 | 2022-07-29 | 华东交通大学 | 基于多模态网络的人群计数方法、系统及计算机设备 |
CN114998826A (zh) * | 2022-05-12 | 2022-09-02 | 西北工业大学 | 密集场景下的人群检测方法 |
CN115100409A (zh) * | 2022-06-30 | 2022-09-23 | 温州大学 | 一种基于孪生网络的视频人像分割算法 |
CN115273154A (zh) * | 2022-09-26 | 2022-11-01 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | 基于边缘重构的热红外行人检测方法、系统及存储介质 |
CN115641507A (zh) * | 2022-11-07 | 2023-01-24 | 哈尔滨工业大学 | 基于自适应多层级融合的遥感图像小尺度面目标检测方法 |
CN115731473A (zh) * | 2022-10-28 | 2023-03-03 | 南开大学 | 面向农田植物非正常变化的遥感图像分析方法 |
CN115909182A (zh) * | 2022-08-09 | 2023-04-04 | 哈尔滨市科佳通用机电股份有限公司 | 一种动车组闸片磨损故障图像识别方法 |
CN115937791A (zh) * | 2023-01-10 | 2023-04-07 | 华南农业大学 | 一种适用于多种养殖模式的家禽计数方法及其计数装置 |
CN115984672A (zh) * | 2023-03-17 | 2023-04-18 | 成都纵横自动化技术股份有限公司 | 基于深度学习的高清图像内小目标的检测方法和装置 |
CN116206133A (zh) * | 2023-04-25 | 2023-06-02 | 山东科技大学 | 一种rgb-d显著性目标检测方法 |
CN116311077A (zh) * | 2023-04-10 | 2023-06-23 | 东北大学 | 一种基于显著性图的多光谱融合的行人检测方法及装置 |
CN116343308A (zh) * | 2023-04-04 | 2023-06-27 | 湖南交通工程学院 | 一种融合人脸图像检测方法、装置、设备及存储介质 |
CN116519106A (zh) * | 2023-06-30 | 2023-08-01 | 中国农业大学 | 一种用于测定生猪体重的方法、装置、存储介质和设备 |
CN116715560A (zh) * | 2023-08-10 | 2023-09-08 | 吉林隆源农业服务有限公司 | 控释肥料的智能化制备方法及其系统 |
CN116758117A (zh) * | 2023-06-28 | 2023-09-15 | 云南大学 | 可见光与红外图像下的目标跟踪方法及系统 |
CN116823908A (zh) * | 2023-06-26 | 2023-09-29 | 北京邮电大学 | 一种基于多尺度特征相关性增强的单目图像深度估计方法 |
CN117237343A (zh) * | 2023-11-13 | 2023-12-15 | 安徽大学 | 半监督rgb-d图像镜面检测方法、存储介质及计算机设备 |
CN117350926A (zh) * | 2023-12-04 | 2024-01-05 | 北京航空航天大学合肥创新研究院 | 一种基于目标权重的多模态数据增强方法 |
CN117392572A (zh) * | 2023-12-11 | 2024-01-12 | 四川能投发展股份有限公司 | 一种基于无人机巡检的输电杆塔鸟巢检测方法 |
CN117475182A (zh) * | 2023-09-13 | 2024-01-30 | 江南大学 | 基于多特征聚合的立体匹配方法 |
CN117635953A (zh) * | 2024-01-26 | 2024-03-01 | 泉州装备制造研究所 | 一种基于多模态无人机航拍的电力系统实时语义分割方法 |
CN118172615A (zh) * | 2024-05-14 | 2024-06-11 | 山西新泰富安新材有限公司 | 用于降低加热炉烧损率的方法 |
CN118553002A (zh) * | 2024-07-29 | 2024-08-27 | 浙江幸福轨道交通运营管理有限公司 | 基于云平台四层架构afc系统的人脸识别系统及方法 |
CN118982488A (zh) * | 2024-07-19 | 2024-11-19 | 南京审计大学 | 一种基于全分辨率语义指导的多尺度低光图像增强方法 |
CN119049091A (zh) * | 2024-10-30 | 2024-11-29 | 杭州电子科技大学 | 一种基于动态检测信度更新的人体标识物识别方法 |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111767882B (zh) * | 2020-07-06 | 2024-07-19 | 江南大学 | 一种基于改进yolo模型的多模态行人检测方法 |
CN111968058B (zh) * | 2020-08-25 | 2023-08-04 | 北京交通大学 | 一种低剂量ct图像降噪方法 |
CN111986240A (zh) * | 2020-09-01 | 2020-11-24 | 交通运输部水运科学研究所 | 基于可见光和热成像数据融合的落水人员检测方法及系统 |
CN112434654B (zh) * | 2020-12-07 | 2022-09-13 | 安徽大学 | 一种基于对称卷积神经网络的跨模态行人重识别方法 |
CN113221659B (zh) * | 2021-04-13 | 2022-12-23 | 天津大学 | 一种基于不确定感知网络的双光车辆检测方法及装置 |
CN113240631B (zh) * | 2021-04-22 | 2023-12-12 | 北京中科慧眼科技有限公司 | 基于rgb-d融合信息的路面检测方法、系统和智能终端 |
CN113360712B (zh) * | 2021-05-21 | 2022-12-06 | 北京百度网讯科技有限公司 | 视频表示的生成方法、装置和电子设备 |
CN113536978B (zh) * | 2021-06-28 | 2023-08-18 | 杭州电子科技大学 | 一种基于显著性的伪装目标检测方法 |
CN113887332B (zh) * | 2021-09-13 | 2024-04-05 | 华南理工大学 | 一种基于多模态融合的肌肤作业安全监测方法 |
CN113902903B (zh) * | 2021-09-30 | 2024-08-02 | 北京工业大学 | 一种基于下采样的双注意力多尺度融合方法 |
CN113887425B (zh) * | 2021-09-30 | 2024-04-12 | 北京工业大学 | 一种面向低算力运算装置的轻量化物体检测方法与系统 |
CN114581838B (zh) * | 2022-04-26 | 2022-08-26 | 阿里巴巴达摩院(杭州)科技有限公司 | 图像处理方法、装置和云设备 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140107842A1 (en) * | 2012-10-16 | 2014-04-17 | Electronics And Telecommunications Research Institute | Human-tracking method and robot apparatus for performing the same |
CN107045630A (zh) * | 2017-04-24 | 2017-08-15 | 杭州司兰木科技有限公司 | 一种基于rgbd的行人检测和身份识别方法及系统 |
CN108734210A (zh) * | 2018-05-17 | 2018-11-02 | 浙江工业大学 | 一种基于跨模态多尺度特征融合的对象检测方法 |
CN109543697A (zh) * | 2018-11-16 | 2019-03-29 | 西北工业大学 | 一种基于深度学习的rgbd图像目标识别方法 |
CN109598301A (zh) * | 2018-11-30 | 2019-04-09 | 腾讯科技(深圳)有限公司 | 检测区域去除方法、装置、终端和存储介质 |
WO2019162241A1 (fr) * | 2018-02-21 | 2019-08-29 | Robert Bosch Gmbh | Détection d'objet en temps réel à l'aide de capteurs de profondeur |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105956532B (zh) * | 2016-04-25 | 2019-05-21 | 大连理工大学 | 一种基于多尺度卷积神经网络的交通场景分类方法 |
CN110309747B (zh) * | 2019-06-21 | 2022-09-16 | 大连理工大学 | 一种支持多尺度快速深度行人检测模型 |
-
2019
- 2019-11-09 CN CN201911090619.5A patent/CN110956094B/zh active Active
-
2020
- 2020-03-25 WO PCT/CN2020/080991 patent/WO2021088300A1/fr active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140107842A1 (en) * | 2012-10-16 | 2014-04-17 | Electronics And Telecommunications Research Institute | Human-tracking method and robot apparatus for performing the same |
CN107045630A (zh) * | 2017-04-24 | 2017-08-15 | 杭州司兰木科技有限公司 | 一种基于rgbd的行人检测和身份识别方法及系统 |
WO2019162241A1 (fr) * | 2018-02-21 | 2019-08-29 | Robert Bosch Gmbh | Détection d'objet en temps réel à l'aide de capteurs de profondeur |
CN108734210A (zh) * | 2018-05-17 | 2018-11-02 | 浙江工业大学 | 一种基于跨模态多尺度特征融合的对象检测方法 |
CN109543697A (zh) * | 2018-11-16 | 2019-03-29 | 西北工业大学 | 一种基于深度学习的rgbd图像目标识别方法 |
CN109598301A (zh) * | 2018-11-30 | 2019-04-09 | 腾讯科技(深圳)有限公司 | 检测区域去除方法、装置、终端和存储介质 |
Cited By (81)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113468954A (zh) * | 2021-05-20 | 2021-10-01 | 西安电子科技大学 | 基于多通道下局部区域特征的人脸伪造检测方法 |
CN113468954B (zh) * | 2021-05-20 | 2023-04-18 | 西安电子科技大学 | 基于多通道下局部区域特征的人脸伪造检测方法 |
CN113313688A (zh) * | 2021-05-28 | 2021-08-27 | 武汉乾峯智能科技有限公司 | 一种含能材料药桶识别方法、系统、电子设备及存储介质 |
CN113313688B (zh) * | 2021-05-28 | 2022-08-05 | 武汉乾峯智能科技有限公司 | 一种含能材料药桶识别方法、系统、电子设备及存储介质 |
CN113362224A (zh) * | 2021-05-31 | 2021-09-07 | 维沃移动通信有限公司 | 图像处理方法、装置、电子设备及可读存储介质 |
CN113298094B (zh) * | 2021-06-10 | 2022-11-04 | 安徽大学 | 一种基于模态关联与双感知解码器的rgb-t的显著性目标检测方法 |
CN113298094A (zh) * | 2021-06-10 | 2021-08-24 | 安徽大学 | 一种基于模态关联与双感知解码器的rgb-t的显著性目标检测方法 |
CN113538615B (zh) * | 2021-06-29 | 2024-01-09 | 中国海洋大学 | 基于双流生成器深度卷积对抗生成网络的遥感图像上色方法 |
CN113538615A (zh) * | 2021-06-29 | 2021-10-22 | 中国海洋大学 | 基于双流生成器深度卷积对抗生成网络的遥感图像上色方法 |
CN113361466A (zh) * | 2021-06-30 | 2021-09-07 | 江南大学 | 一种基于多模态交叉指导学习的多光谱目标检测方法 |
CN113361466B (zh) * | 2021-06-30 | 2024-03-12 | 江南大学 | 一种基于多模态交叉指导学习的多光谱目标检测方法 |
CN113486781A (zh) * | 2021-07-02 | 2021-10-08 | 国网电力科学研究院有限公司 | 一种基于深度学习模型的电力巡检方法及装置 |
CN113486781B (zh) * | 2021-07-02 | 2023-10-24 | 国网电力科学研究院有限公司 | 一种基于深度学习模型的电力巡检方法及装置 |
CN113537326A (zh) * | 2021-07-06 | 2021-10-22 | 安徽大学 | 一种rgb-d图像显著目标检测方法 |
CN113569723A (zh) * | 2021-07-27 | 2021-10-29 | 北京京东尚科信息技术有限公司 | 一种人脸检测方法、装置、电子设备及存储介质 |
CN113658134A (zh) * | 2021-08-13 | 2021-11-16 | 安徽大学 | 一种多模态对齐校准的rgb-d图像显著目标检测方法 |
CN113657521B (zh) * | 2021-08-23 | 2023-09-19 | 天津大学 | 一种分离图像中两种互斥成分的方法 |
CN113657521A (zh) * | 2021-08-23 | 2021-11-16 | 天津大学 | 一种分离图像中两种互斥成分的方法 |
CN113848234A (zh) * | 2021-09-16 | 2021-12-28 | 南京航空航天大学 | 一种基于多模态信息的航空复合材料的检测方法 |
CN113989245A (zh) * | 2021-10-28 | 2022-01-28 | 杭州中科睿鉴科技有限公司 | 多视角多尺度图像篡改检测方法 |
CN113989245B (zh) * | 2021-10-28 | 2023-01-24 | 杭州中科睿鉴科技有限公司 | 多视角多尺度图像篡改检测方法 |
CN114037938A (zh) * | 2021-11-09 | 2022-02-11 | 桂林电子科技大学 | 一种基于NFL-Net的低照度目标检测方法 |
CN114037938B (zh) * | 2021-11-09 | 2024-03-26 | 桂林电子科技大学 | 一种基于NFL-Net的低照度目标检测方法 |
CN113902783B (zh) * | 2021-11-19 | 2024-04-30 | 东北大学 | 一种融合三模态图像的显著性目标检测系统及方法 |
CN113902783A (zh) * | 2021-11-19 | 2022-01-07 | 东北大学 | 一种融合三模态图像的显著性目标检测系统及方法 |
CN114202646A (zh) * | 2021-11-26 | 2022-03-18 | 深圳市朗驰欣创科技股份有限公司 | 一种基于深度学习的红外图像吸烟检测方法与系统 |
CN114119965A (zh) * | 2021-11-30 | 2022-03-01 | 齐鲁工业大学 | 一种道路目标检测方法及系统 |
CN114170174A (zh) * | 2021-12-02 | 2022-03-11 | 沈阳工业大学 | 基于RGB-D图像的CLANet钢轨表面缺陷检测系统及方法 |
CN114170174B (zh) * | 2021-12-02 | 2024-01-23 | 沈阳工业大学 | 基于RGB-D图像的CLANet钢轨表面缺陷检测系统及方法 |
CN114202663A (zh) * | 2021-12-03 | 2022-03-18 | 大连理工大学宁波研究院 | 一种基于彩色图像和深度图像的显著度检测方法 |
CN114372986A (zh) * | 2021-12-30 | 2022-04-19 | 深圳大学 | 注意力引导多模态特征融合的图像语义分割方法及装置 |
CN114372986B (zh) * | 2021-12-30 | 2024-05-24 | 深圳大学 | 注意力引导多模态特征融合的图像语义分割方法及装置 |
CN114359228A (zh) * | 2022-01-06 | 2022-04-15 | 深圳思谋信息科技有限公司 | 物体表面缺陷检测方法、装置、计算机设备和存储介质 |
CN114049508A (zh) * | 2022-01-12 | 2022-02-15 | 成都无糖信息技术有限公司 | 一种基于图片聚类和人工研判的诈骗网站识别方法及系统 |
CN114049508B (zh) * | 2022-01-12 | 2022-04-01 | 成都无糖信息技术有限公司 | 一种基于图片聚类和人工研判的诈骗网站识别方法及系统 |
CN114445442B (zh) * | 2022-01-28 | 2022-12-02 | 杭州电子科技大学 | 基于非对称交叉融合的多光谱图像语义分割方法 |
CN114445442A (zh) * | 2022-01-28 | 2022-05-06 | 杭州电子科技大学 | 基于非对称交叉融合的多光谱图像语义分割方法 |
CN114219807A (zh) * | 2022-02-22 | 2022-03-22 | 成都爱迦飞诗特科技有限公司 | 乳腺超声检查图像分级方法、装置、设备和存储介质 |
CN114708295B (zh) * | 2022-04-02 | 2024-04-16 | 华南理工大学 | 一种基于Transformer的物流包裹分离方法 |
CN114708295A (zh) * | 2022-04-02 | 2022-07-05 | 华南理工大学 | 一种基于Transformer的物流包裹分离方法 |
CN114998826A (zh) * | 2022-05-12 | 2022-09-02 | 西北工业大学 | 密集场景下的人群检测方法 |
CN114663436A (zh) * | 2022-05-25 | 2022-06-24 | 南京航空航天大学 | 一种基于深度学习的跨尺度缺陷检测方法 |
CN115100409B (zh) * | 2022-06-30 | 2024-04-26 | 温州大学 | 一种基于孪生网络的视频人像分割算法 |
CN115100409A (zh) * | 2022-06-30 | 2022-09-23 | 温州大学 | 一种基于孪生网络的视频人像分割算法 |
CN114821488A (zh) * | 2022-06-30 | 2022-07-29 | 华东交通大学 | 基于多模态网络的人群计数方法、系统及计算机设备 |
CN115909182A (zh) * | 2022-08-09 | 2023-04-04 | 哈尔滨市科佳通用机电股份有限公司 | 一种动车组闸片磨损故障图像识别方法 |
CN115909182B (zh) * | 2022-08-09 | 2023-08-08 | 哈尔滨市科佳通用机电股份有限公司 | 一种动车组闸片磨损故障图像识别方法 |
CN115273154B (zh) * | 2022-09-26 | 2023-01-17 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | 基于边缘重构的热红外行人检测方法、系统及存储介质 |
CN115273154A (zh) * | 2022-09-26 | 2022-11-01 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | 基于边缘重构的热红外行人检测方法、系统及存储介质 |
CN115731473B (zh) * | 2022-10-28 | 2024-05-31 | 南开大学 | 面向农田植物非正常变化的遥感图像分析方法 |
CN115731473A (zh) * | 2022-10-28 | 2023-03-03 | 南开大学 | 面向农田植物非正常变化的遥感图像分析方法 |
CN115641507A (zh) * | 2022-11-07 | 2023-01-24 | 哈尔滨工业大学 | 基于自适应多层级融合的遥感图像小尺度面目标检测方法 |
CN115937791A (zh) * | 2023-01-10 | 2023-04-07 | 华南农业大学 | 一种适用于多种养殖模式的家禽计数方法及其计数装置 |
CN115984672A (zh) * | 2023-03-17 | 2023-04-18 | 成都纵横自动化技术股份有限公司 | 基于深度学习的高清图像内小目标的检测方法和装置 |
CN116343308B (zh) * | 2023-04-04 | 2024-02-09 | 湖南交通工程学院 | 一种融合人脸图像检测方法、装置、设备及存储介质 |
CN116343308A (zh) * | 2023-04-04 | 2023-06-27 | 湖南交通工程学院 | 一种融合人脸图像检测方法、装置、设备及存储介质 |
CN116311077B (zh) * | 2023-04-10 | 2023-11-07 | 东北大学 | 一种基于显著性图的多光谱融合的行人检测方法及装置 |
CN116311077A (zh) * | 2023-04-10 | 2023-06-23 | 东北大学 | 一种基于显著性图的多光谱融合的行人检测方法及装置 |
CN116206133A (zh) * | 2023-04-25 | 2023-06-02 | 山东科技大学 | 一种rgb-d显著性目标检测方法 |
CN116206133B (zh) * | 2023-04-25 | 2023-09-05 | 山东科技大学 | 一种rgb-d显著性目标检测方法 |
CN116823908A (zh) * | 2023-06-26 | 2023-09-29 | 北京邮电大学 | 一种基于多尺度特征相关性增强的单目图像深度估计方法 |
CN116758117A (zh) * | 2023-06-28 | 2023-09-15 | 云南大学 | 可见光与红外图像下的目标跟踪方法及系统 |
CN116758117B (zh) * | 2023-06-28 | 2024-02-09 | 云南大学 | 可见光与红外图像下的目标跟踪方法及系统 |
CN116519106B (zh) * | 2023-06-30 | 2023-09-15 | 中国农业大学 | 一种用于测定生猪体重的方法、装置、存储介质和设备 |
CN116519106A (zh) * | 2023-06-30 | 2023-08-01 | 中国农业大学 | 一种用于测定生猪体重的方法、装置、存储介质和设备 |
CN116715560A (zh) * | 2023-08-10 | 2023-09-08 | 吉林隆源农业服务有限公司 | 控释肥料的智能化制备方法及其系统 |
CN116715560B (zh) * | 2023-08-10 | 2023-11-14 | 吉林隆源农业服务有限公司 | 控释肥料的智能化制备方法及其系统 |
CN117475182B (zh) * | 2023-09-13 | 2024-06-04 | 江南大学 | 基于多特征聚合的立体匹配方法 |
CN117475182A (zh) * | 2023-09-13 | 2024-01-30 | 江南大学 | 基于多特征聚合的立体匹配方法 |
CN117237343B (zh) * | 2023-11-13 | 2024-01-30 | 安徽大学 | 半监督rgb-d图像镜面检测方法、存储介质及计算机设备 |
CN117237343A (zh) * | 2023-11-13 | 2023-12-15 | 安徽大学 | 半监督rgb-d图像镜面检测方法、存储介质及计算机设备 |
CN117350926A (zh) * | 2023-12-04 | 2024-01-05 | 北京航空航天大学合肥创新研究院 | 一种基于目标权重的多模态数据增强方法 |
CN117350926B (zh) * | 2023-12-04 | 2024-02-13 | 北京航空航天大学合肥创新研究院 | 一种基于目标权重的多模态数据增强方法 |
CN117392572A (zh) * | 2023-12-11 | 2024-01-12 | 四川能投发展股份有限公司 | 一种基于无人机巡检的输电杆塔鸟巢检测方法 |
CN117392572B (zh) * | 2023-12-11 | 2024-02-27 | 四川能投发展股份有限公司 | 一种基于无人机巡检的输电杆塔鸟巢检测方法 |
CN117635953B (zh) * | 2024-01-26 | 2024-04-26 | 泉州装备制造研究所 | 一种基于多模态无人机航拍的电力系统实时语义分割方法 |
CN117635953A (zh) * | 2024-01-26 | 2024-03-01 | 泉州装备制造研究所 | 一种基于多模态无人机航拍的电力系统实时语义分割方法 |
CN118172615A (zh) * | 2024-05-14 | 2024-06-11 | 山西新泰富安新材有限公司 | 用于降低加热炉烧损率的方法 |
CN118982488A (zh) * | 2024-07-19 | 2024-11-19 | 南京审计大学 | 一种基于全分辨率语义指导的多尺度低光图像增强方法 |
CN118553002A (zh) * | 2024-07-29 | 2024-08-27 | 浙江幸福轨道交通运营管理有限公司 | 基于云平台四层架构afc系统的人脸识别系统及方法 |
CN119049091A (zh) * | 2024-10-30 | 2024-11-29 | 杭州电子科技大学 | 一种基于动态检测信度更新的人体标识物识别方法 |
Also Published As
Publication number | Publication date |
---|---|
CN110956094B (zh) | 2023-12-01 |
CN110956094A (zh) | 2020-04-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021088300A1 (fr) | Procédé de détection de personnel par fusion multimodale rvb-d reposant sur un réseau asymétrique à double flux | |
CN109819208B (zh) | 一种基于人工智能动态监控的密集人群安防监控管理方法 | |
CN107253485B (zh) | 异物侵入检测方法及异物侵入检测装置 | |
CN112288008B (zh) | 一种基于深度学习的马赛克多光谱图像伪装目标检测方法 | |
AU2006252252B2 (en) | Image processing method and apparatus | |
CN111931684A (zh) | 一种基于视频卫星数据鉴别特征的弱小目标检测方法 | |
CN110363140A (zh) | 一种基于红外图像的人体动作实时识别方法 | |
CN107622258A (zh) | 一种结合静态底层特征和运动信息的快速行人检测方法 | |
CN110309781A (zh) | 基于多尺度光谱纹理自适应融合的房屋损毁遥感识别方法 | |
Zin et al. | Fusion of infrared and visible images for robust person detection | |
CN103049751A (zh) | 一种改进的加权区域匹配高空视频行人识别方法 | |
CN106295636A (zh) | 基于多特征融合级联分类器的消防通道车辆检测方法 | |
CN117152443B (zh) | 一种基于语义前导指引的图像实例分割方法及系统 | |
CN112926506A (zh) | 一种基于卷积神经网络的非受控人脸检测方法及系统 | |
CN112084928B (zh) | 基于视觉注意力机制和ConvLSTM网络的道路交通事故检测方法 | |
CN114119586A (zh) | 一种基于机器视觉的飞机蒙皮缺陷智能检测方法 | |
CN111582074A (zh) | 一种基于场景深度信息感知的监控视频树叶遮挡检测方法 | |
CN105513053A (zh) | 一种用于视频分析中背景建模方法 | |
CN114648714A (zh) | 一种基于yolo的车间规范行为的监测方法 | |
CN114519819A (zh) | 一种基于全局上下文感知的遥感图像目标检测方法 | |
CN113177439B (zh) | 一种行人翻越马路护栏检测方法 | |
Zhu et al. | Towards automatic wild animal detection in low quality camera-trap images using two-channeled perceiving residual pyramid networks | |
CN115376202A (zh) | 一种基于深度学习的电梯轿厢内乘客行为识别方法 | |
CN111274964A (zh) | 一种基于无人机视觉显著性分析水面污染物的检测方法 | |
CN107045630B (zh) | 一种基于rgbd的行人检测和身份识别方法及系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20884981 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20884981 Country of ref document: EP Kind code of ref document: A1 |