[go: up one dir, main page]

CN112906797A - Plane grabbing detection method based on computer vision and deep learning - Google Patents

Plane grabbing detection method based on computer vision and deep learning Download PDF

Info

Publication number
CN112906797A
CN112906797A CN202110207871.0A CN202110207871A CN112906797A CN 112906797 A CN112906797 A CN 112906797A CN 202110207871 A CN202110207871 A CN 202110207871A CN 112906797 A CN112906797 A CN 112906797A
Authority
CN
China
Prior art keywords
grasping
grabbing
image
training
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110207871.0A
Other languages
Chinese (zh)
Other versions
CN112906797B (en
Inventor
石敏
路昊
朱登明
李兆歆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China Electric Power University
Original Assignee
North China Electric Power University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China Electric Power University filed Critical North China Electric Power University
Priority to CN202110207871.0A priority Critical patent/CN112906797B/en
Publication of CN112906797A publication Critical patent/CN112906797A/en
Application granted granted Critical
Publication of CN112906797B publication Critical patent/CN112906797B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本发明属于公开了机械臂抓取计算领域的一种基于计算机视觉和深度学习的平面抓取检测方法,其步骤包括:收集或自制抓取数据集,并进行特定的数据增强;利用深度补全算法补全深度图信息,并对数据集进行深度信息融合、统一裁剪以及训练验证划分;根据训练得到的抓取检测模型,利用真实图像数据作为网络输入,抓取质量分数及抓取框五维表示作为输出,采用反向传播算法和基于标准梯度的优化算法,通过排序优化转换为抓取框四个顶点信息,实现可视化,最终映射到真实世界坐标。使得检测得到的抓取框与真实值差异最小化;本发明解决了经验抓取检测方法为实现泛化性而很难满足精准性的问题;解决了抓取检测方法在真实场景中难以保证实时性的问题。

Figure 202110207871

The invention belongs to a plane grasping detection method based on computer vision and deep learning in the field of grasping computing of a mechanical arm, and the steps include: collecting or self-made grasping data sets, and performing specific data enhancement; The algorithm completes the depth map information, and performs depth information fusion, unified cropping, and training and verification division of the data set; according to the grasping detection model obtained by training, using the real image data as the network input, the grasping quality score and the five-dimensional grasping frame are obtained. The representation is used as the output, using the back-propagation algorithm and the standard gradient-based optimization algorithm, and converting it into the four vertex information of the grab frame through sorting optimization, realizing visualization, and finally mapping to real-world coordinates. Minimize the difference between the detected grasping frame and the real value; the invention solves the problem that the empirical grasping detection method is difficult to meet the accuracy in order to achieve generalization; it solves the problem that the grasping detection method is difficult to ensure real-time sexual issues.

Figure 202110207871

Description

Plane grabbing detection method based on computer vision and deep learning
Technical Field
The invention belongs to the technical field of mechanical arm grabbing, and particularly relates to a plane grabbing detection method based on computer vision and deep learning.
Background
The gripping force of the robot is far behind the performance of human beings and is a problem which is not solved in the robot field. When people see novel objects, they instinctively grasp any unknown object quickly and easily based on their own experience. Much of the work related to robotic grasping and manipulation has been expanded in recent years, but real-time grasping detection remains a challenge. The robot grabbing problem can be divided into three successive stages: grip detection, trajectory planning and execution. Grab detection is a visual recognition problem in which a robot uses its sensors to detect a graspable object in its environment. The sensor for perceiving the robot environment is typically a 3D vision system or an RGB-D camera. The key task is to predict potential grab from sensor information and map pixel values to real world coordinates. This is a critical step in performing the gripping, since the subsequent steps depend on the coordinates calculated in this step. The calculated real world coordinates are then converted to the position and orientation of the end of the robot arm tool. An optimal trajectory of the robotic arm is then planned to reach the target gripping location. Subsequently, planning of the robotic arm is performed using either an open-loop or closed-loop controller.
As robots become more intelligent than ever with more and more research, there is an increasing need for a general technique to detect fast and robust grabbing of any object encountered by the robot. One of the most important problems is how to accurately transfer the knowledge learned by the robot to a novel real-world object, which not only requires real-time and accurate algorithm, but also requires generalization due to new development requirements.
The grasping detection is mainly divided into two methods, one is an analytical method and the other is an empirical method. The analysis method refers to that the grabbing pose is limited by designing force closing constraint conditions meeting the conditions of stability, flexibility and the like according to various parameters of the manipulator. Such a method can be understood as a solution and optimization of a constraint problem based on dynamics and geometry. When the grabbing pose meets the force closing condition, the object is clamped by the clamp, and the object does not displace or rotate under the action of static friction force, so that the grabbing stability is maintained. The grabbing pose generated by the analysis method can ensure successful grabbing of the target object, but the method can only be applied to a simple ideal model. The variability of the actual scene, the randomness of the placement of objects, the noise of the image sensor and the like increase the complexity of calculation on one hand, and the calculation accuracy cannot be guaranteed on the other hand. The empirical method is to detect the grabbing pose and judge the reasonability of the grabbing pose by using the information in the knowledge base. And based on the characteristics of the object, the similarity is utilized to carry out classification and pose estimation, thereby achieving the purpose of grabbing. The parameters such as the friction coefficient of the target object and the like do not need to be known in advance like an analysis method, and the robustness is better. But the empirical method usually cannot compromise between accuracy and real-time performance.
Disclosure of Invention
The invention aims to provide a plane grabbing detection method based on computer vision and deep learning, which is characterized by comprising the following steps of:
step 1: collecting or self-making a capture data set, wherein the capture data set comprises RGB images and corresponding annotation information and depth information; carrying out data enhancement of scale transformation, translation, turnover and rotation on the data set, and expanding the data set;
step 2: making and dividing training data according to the expanded data set obtained in the step 1; completing the depth map information by using a depth completion algorithm, and completing the fusion of the RGB image and the depth information; and cutting and scaling the fused image to meet the input format of a capture detection model, and performing the following steps according to the following steps of 9: 1, randomly dividing a training set and a verification set according to the proportion, and respectively training and verifying the captured detection model;
and step 3: training the proposed grabbing detection model by using training data, and optimizing the gradient of the objective function by adopting a back propagation algorithm and an optimization algorithm based on a standard gradient so as to minimize the difference between a grabbing frame obtained by detection and a true value; meanwhile, the grabbing detection model is tested by using a verification set to adjust the learning rate in the grabbing detection model training process and avoid overfitting of the grabbing detection model to a certain extent; wherein the objective function is defined as:
Ltoral=Lboxes+LQ+Langle
wherein L isboxesIs a box loss, LQLoss of mass fraction, l, of grabbingangleAngle prediction loss;
and 4, step 4: according to the captured detection model obtained through training, real image data is used as network input, captured quality scores and captured frame five-dimensional representation are used as captured detection model output, the optimal is selected through sequencing and converted into information of four top points of a captured frame, visualization is achieved, and the information is mapped to real world coordinates finally;
the five-dimensional grasping representation is widely applied to related work in recent years; five-dimensional grabbing is represented as describing the grab box as:
g={,x,y,θ,h,w}
the output of the grabbing detection model is as follows:
g={x,y,θ,h,w,Q}
where (x, y) is the center point of the grab frame, h and w are the height and width of the grab frame, respectively, θ is its direction relative to the horizontal axis of the image, Q is the grab quality score, and the probability of grab is evaluated by a value between 0 and 1, with a larger Q indicating a greater feasibility of the grab frame.
The design of the grabbing detection model (network model) comprises a feature extractor at the front end and a grabbing predictor at the rear end; the feature extractor part comprises a convolution module, an attention residual error module, a cross-level local module and other modules which are connected and combined.
The fusion of the RGB image and the depth information comprises depth information extraction, depth map completion and fusion of the RGB image and the supplemented depth map into an RGD image, wherein the RGD image is used as model training data.
Completing fusion of the RGB image and the depth information, wherein the modes for training the RGB data comprise a single-mode training mode and a multi-mode training mode; the RGD data is fused by replacing a B channel in an RGB image with a Depth image, and the design realizes multi-mode, provides more available information and shows good effect in experiments; aiming at the extraction of depth information in a data set, the following formula is designed:
Figure BDA0002951501190000031
the (x, y, z) is a coordinate in the point cloud information, Max is an upper depth value limit set according to a scene, Min is a lower depth value limit set according to the scene, invalid information can be filtered to a certain extent by the design of limiting a threshold range, in addition, global normalization can be realized, normalization is not performed aiming at a single image, and data are more standardized; the normalized value is enlarged by 255 times, and the scale of the RGB channel value is adjusted to meet the RGD fusion condition.
The method comprises the following steps of testing a capture detection model by using a verification set, realizing channel replacement of a depth image and RGB (red, green and blue), and then performing data amplification on each picture, wherein an amplification strategy is as follows: randomly translating the pixels up and down by 0-50 and randomly rotating the pixels by 0-360 degrees; finally, cutting out a square area with a fixed size along the center to be used as a training image; the prediction of the grab box is performed using a convolution module and a convolution operation. Wherein the position and size information of the grabbing frame is directly regressed; the grabbing quality score is also obtained by a direct regression mode, sigmoid is carried out during final output, the output range of the grabbing quality score obtained through prediction is controlled to be 0-1, and grabbing confidence can be well represented; the angle is predicted by means of classification.
The invention has the beneficial effects that:
1. the problem that the traditional analysis method can only be applied to a simple ideal model and can not realize generalization is solved;
2. the problem that accuracy is difficult to meet in order to realize generalization of an empirical grasping detection method is solved;
3. the problem that the real-time performance of the grabbing detection method in a real scene is difficult to guarantee is solved.
Drawings
Fig. 1 is a schematic view of a suitable robot arm grabbing scene.
FIG. 2 is a five-dimensional representation of a capture box.
Fig. 3 is a schematic view of a grab detection model.
Fig. 4 is a diagram showing the grasping result.
Detailed Description
The invention provides a plane grabbing detection method based on computer vision and deep learning, which comprises the steps of collecting and sorting a source data set or self-making a data set by combining a grabbing target, wherein the data set is subjected to data enhancement of scale transformation, translation, overturning and rotation; and (5) complementing the depth map information by using a depth complementing algorithm, and completing the fusion of the RGB image and the depth information. And cutting and scaling the fused image to meet the input format of the model, and performing the following steps according to the step (9): 1, randomly dividing a training set and a verification set; optimizing the gradient of the objective function by adopting a back propagation algorithm and an optimization algorithm based on a standard gradient, so that the difference between the detected capture frame and a true value is minimized; and (4) taking the real image data as the input of a capture detection model to carry out capture detection, and visualizing the result.
The whole mechanical arm grabbing scene is shown in figure 1. The robot mainly comprises a mechanical arm, a parallel two-finger clamp, a depth camera, a computer, a controller, an object to be grabbed, a platform and the like. The depth camera shoots an object to be grabbed on the platform to obtain RGB images and depth information. The computer reads the RGB image and the depth information and processes the RGB image and the depth information, and a feasible grabbing frame is detected from the image information by utilizing an realized grabbing detection algorithm. And the grabbing frame is mapped to a mechanical arm coordinate system and is transmitted to the controller, and grabbing track planning and execution are carried out on the mechanical arm.
The invention mainly aims at the treatment of the visual part of the mechanical arm grabbing problem, and in the past method, five-dimensional grabbing representation is proposed and widely applied to related work in recent years. Five-dimensional grabbing is represented as describing the grab box as:
g={x,y,θ,h,w)
the five-dimensional grab is (x, y) as the center point of the grab frame, h and w are the height and width of the grab frame, respectively, and θ is its direction relative to the horizontal axis of the image.
The output of the grabbing detection model is as follows:
g={x,y,θ,h,w,Q}
where (x, y) is the center point of the grab frame, h and w are the height and width of the grab frame, respectively, θ is its direction relative to the horizontal axis of the image, Q is the grab quality score, and the probability of grab is evaluated by a value between 0 and 1, with a larger Q indicating a greater feasibility of the grab frame.
The five-dimensional representation method has smaller dimension and less calculation amount. The feasibility of such a five-dimensional representation is demonstrated in recent work, where the grab can be well represented in an image coordinate system (as shown in fig. 2), h and w being fixed and defined by the shape of the grab, respectively.
The five-dimensional representation can well represent the grab box of a planar scene, but this is limited to the fact that reasonable grab points do exist in the scene. When there is no object in the scene or no object grabbing point is feasible, the grabbing prediction result is also obtained, which is unreasonable. The five-dimensional representation method is expanded, Q is set as a grabbing quality score, the grabbing possibility is evaluated by using a numerical value between 0 and 1, and the greater Q is, the greater the feasibility of the grabbing frame is. Through certain threshold setting, snatching with poor filtering feasibility can be achieved.
The invention provides two training modes, one is directly based on RGB data to train, namely a single-mode training mode; the other training based on the RGD data is a multi-modal training mode. The RGD data is fused by replacing a B channel in an RGB image with a Depth image, and the design realizes multi-mode, provides more available information and shows good effect in experiments. Aiming at the extraction of depth information in a data set, the following formula is designed:
Figure BDA0002951501190000061
the design of limiting the threshold range can filter invalid information to a certain extent, and in addition, global normalization can be realized, and the normalization is not performed aiming at a single image, so that the data is more standardized. The normalized value is enlarged by 255 times, and the scale of the RGB channel value is adjusted to meet the RGD fusion condition. Due to the limitation of equipment, partial data of the point cloud is often lost, so that a complete depth map cannot be obtained. Aiming at the problem, the invention uses a deep completion method to make a corresponding mask file and uses an NS method in OpenCV to repair the mask file.
Channel replacement is realized on the depth image and RGB, and then data amplification is carried out on each picture, wherein the amplification strategy is as follows: randomly translating 0-50 pixels up and down and randomly rotating 0-360 degrees. Finally, a square area with a fixed size is cut out along the center to be used as a training image. Training images were processed as per 9: the scale of 1 is divided into a training set and a validation set, which are used for training of the model and testing of the model, respectively.
The capture detection model designed by the invention comprises a front-end feature extractor and a rear-end capture predictor, as shown in fig. 3, the deep convolutional network has strong feature extraction capability in the fields of image classification, target detection and the like, and the deep convolutional network is designed to be used as a backbone network for feature extraction. The system is mainly formed by combining and connecting a convolution module, an attention residual error module, a cross-level local module and the like. The network design has enough network depth, contains cross-level connection and has strong feature extraction capability and efficiency.
And the grab predictor part carries out the prediction of the grab frame by utilizing the convolution module and the convolution operation. Wherein the position and size information of the grabbing frame is directly regressed; the grabbing quality score is also obtained by a direct regression mode, sigmoid is carried out during final output, the output range of the grabbing quality score obtained through prediction is controlled to be 0-1, and grabbing confidence can be well represented; the angle is predicted by means of classification.
The loss function of the model of the invention is divided into three parts, where LboxesIs a box loss, LQTo capture mass fraction loss, LangleFor angle prediction loss, three outputs of the network are respectively corresponded. And optimizing the gradient of the objective function by adopting a back propagation algorithm and an optimization algorithm based on a standard gradient, so that the difference between the detected capture frame and a true value is minimized.
Ltoral=Lboxes+LQ+Langle
And obtaining a capture detection model according to training, using real image data as network input, using the { x, y, 0, h, w } five-dimensional representation of the capture frame as output, and converting the capture frame into four vertex information of the capture frame. Meanwhile, the quality scores of the grabbing frames are sequenced, the grabbing frames with the quality scores exceeding the set threshold are reserved, the grabbing frame with the maximum quality score is output, and visualization is achieved, as shown in fig. 4. And then, calibrating the internal reference and the external reference of the camera by using a Zhang-Zhengyou calibration method, and mapping the pixel points in the image to the three-dimensional coordinate information in the real world.
Figure BDA0002951501190000071
Compared with other algorithms, the algorithm provided by the invention has higher accuracy and efficiency and shows good effect in a real scene.
The present invention is not limited to the above embodiments, and any changes or substitutions that can be easily made by those skilled in the art within the technical scope of the present invention are also within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (6)

1.一种基于计算机视觉和深度学习的平面抓取检测方法,其特征在于,包括:1. a plane grasping detection method based on computer vision and deep learning, is characterized in that, comprises: 步骤1:收集或自制抓取数据集,其中包括RGB图像及相应的标注信息,及深度信息;对数据集进行尺度变换、平移、翻转和旋转的数据增强,扩充数据集;Step 1: Collect or self-create a data set, including RGB images and corresponding annotation information, and depth information; perform data enhancement of scale transformation, translation, flipping and rotation on the data set to expand the data set; 步骤2:根据步骤1得到的扩充后的数据集制作并划分训练数据;利用深度补全算法补全深度图信息,并完成RGB图像与深度信息的融合;对融合图像进行裁剪及缩放,使其满足抓取框的输入格式,并按照9∶1的比例随机划分训练集与验证集,分别用于抓取检测模型的训练和验证;Step 2: Make and divide the training data according to the expanded data set obtained in Step 1; use the depth completion algorithm to complete the depth map information, and complete the fusion of the RGB image and the depth information; Meet the input format of the grab box, and randomly divide the training set and the verification set according to the ratio of 9:1, which are used for the training and verification of the grab detection model respectively; 步骤3:利用训练数据对提出的抓取检测模型进行训练,采用反向传播算法和基于标准梯度的优化算法来优化目标函数的梯度,使得检测得到的抓取框与真实值差异最小化;同时利用验证集对抓取检测模型进行测试,来调整抓取检测模型训练过程中的学习率,并一定程度避免抓取检测模型的过拟合;其中目标函数的定义为:Step 3: Use the training data to train the proposed grasping detection model, and use the back-propagation algorithm and the standard gradient-based optimization algorithm to optimize the gradient of the objective function, so as to minimize the difference between the detected grasping frame and the real value; Use the verification set to test the grasping detection model to adjust the learning rate in the training process of the grasping detection model, and avoid overfitting of the grasping detection model to a certain extent; the definition of the objective function is: Ltotal=Lboxes+LQ+Langle L total =L boxes +L Q +L angle 其中Lboxes为boxes损失、LQ抓取质量分数损失、Langle角度预测损失;Among them, L boxes are boxes loss, L Q capture quality score loss, and L angle angle prediction loss; 步骤4:根据训练得到的抓取检测模型,利用真实图像数据作为网络输入,抓取质量分数及抓取框五维表示作为抓取检测模型输出,通过排序选取最优并将其转换为抓取框四个顶点信息,实现可视化,最终映射到真实世界坐标。Step 4: According to the grasping detection model obtained by training, the real image data is used as the network input, the grasping quality score and the five-dimensional representation of the grasping frame are used as the output of the grasping detection model, and the optimal one is selected by sorting and converted into grasping Box four vertex information, realize visualization, and finally map to real world coordinates. 2.根据权利要求1所述基于计算机视觉和深度学习的平面抓取检测方法,其特征在于,所述五维抓取表示,在近年的相关工作中得到了广泛的应用;五维抓取表示为把抓取框描述为:2. The plane grasping detection method based on computer vision and deep learning according to claim 1, wherein the five-dimensional grasping representation has been widely used in related work in recent years; the five-dimensional grasping representation To describe the grab box as: g={x,y,,θ,h,w}g={x,y,,theta,h,w} 所述抓取检测模型输出为:The output of the grab detection model is: g={x,y,θ,h,w,Q}g={x, y, θ, h, w, Q} 其中(x,y)为抓取框的中心点,h和w分别为抓取框的高和宽,θ为其相对于图像水平轴的方向,Q为抓取质量分数,用0到1之间的一个数值来评估抓取的可能性,Q越大表示该抓取框的可行性越大。where (x, y) is the center point of the grabbing frame, h and w are the height and width of the grabbing frame, respectively, θ is the direction relative to the horizontal axis of the image, and Q is the grabbing quality score, which ranges from 0 to 1. A value between is used to evaluate the possibility of grasping, and the larger Q is, the greater the possibility of grasping the frame. 3.根据权利要求1所述基于计算机视觉和深度学习的平面抓取检测方法,其特征在于,所述抓取检测模型(网络模型)设计包括前端的特征提取器和后端的抓取预测器;其中特征提取器部分设计包括卷积模块、注意力残差模块和跨级局部模块等模块连接组合。3. the plane grasping detection method based on computer vision and deep learning according to claim 1, is characterized in that, described grasping detection model (network model) design comprises the feature extractor of front-end and the grasping predictor of back-end; The part of the feature extractor design includes the convolution module, the attention residual module and the cross-level local module and other modules connected and combined. 4.根据权利要求1所述基于计算机视觉和深度学习的平面抓取检测方法,其特征在于,所述完成RGB图像与深度信息的融合包含了深度信息提取、深度图补全以及RGB图像与补全后的深度图融合为RGD图像,其中RGD图像将作为模型训练数据。4. the plane grasping detection method based on computer vision and deep learning according to claim 1, is characterized in that, described completes the fusion of RGB image and depth information and comprises depth information extraction, depth map complement and RGB image and complement. The full depth map is fused into an RGD image, where the RGD image will be used as model training data. 5.根据权利要求1所述基于计算机视觉和深度学习的平面抓取检测方法,其特征在于,所述完成RGB图像与深度信息的融合,对RGB数据进行训练的方式包括单模态训练方式和多模态训练方式;其中RGD数据使用Depth图像替换RGB图像中的B通道融合而成,这一设计实现了多模态,提供了更多的可利用信息,并在实验中展示了良好的效果;针对数据集中深度信息提取,设计了如下公式:5. the plane grabbing detection method based on computer vision and deep learning according to claim 1, is characterized in that, described completes the fusion of RGB image and depth information, the mode that RGB data is trained comprises single-modal training mode and Multi-modal training method; in which RGD data is fused by replacing the B channel in the RGB image with the Depth image. This design realizes multi-modality, provides more available information, and shows good results in experiments ; For the depth information extraction in the dataset, the following formula is designed:
Figure FDA0002951501180000021
Figure FDA0002951501180000021
其中(x,y,z)为点云信息中的坐标,Max为根据场景设定的深度值上限,Min为根据场景设定的深度值下限,限定阈值范围的设计可以从一定程度上过滤无效信息,此外可以实现全局的归一化,并非针对单一图像进行归一化,使得数据更加标准化;归一化后的值扩大255倍,调整至RGB通道值的尺度,使其满足RGD融合的条件。Among them (x, y, z) are the coordinates in the point cloud information, Max is the upper limit of the depth value set according to the scene, Min is the lower limit of the depth value set according to the scene, the design of limiting the threshold range can filter invalid to a certain extent In addition, it can achieve global normalization instead of normalization for a single image, which makes the data more standardized; the normalized value is expanded by 255 times and adjusted to the scale of the RGB channel value to meet the conditions of RGD fusion. .
6.根据权利要求1所述基于计算机视觉和深度学习的平面抓取检测方法,其特征在于,所述利用验证集对抓取检测模型进行测试,将深度图像与RGB实现通道替换,随后将每张图片进行数据扩增,扩增策略:随机上下平移0-50像素,随机旋转0-360°;最终沿中心截取固定大小正方形区域作为训练图像;利用卷积模块和卷积操作进行了抓取框的预测。其中抓取框位置及大小信息进行了直接的回归;抓取质量分数同样利用直接回归的方式,但是在最终输出时进行了sigmoid,将预测得到的抓取质量分数输出范围控制在0-1之间,可以很好的表示抓取置信度;角度则是通过分类的方式进行预测。6. The plane grabbing detection method based on computer vision and deep learning according to claim 1, is characterized in that, described utilizing the verification set to test the grabbing detection model, depth image and RGB realization channel are replaced, each will be subsequently replaced. Amplify the data of a single image, and the amplification strategy: randomly translate 0-50 pixels up and down, and randomly rotate 0-360°; finally intercept a fixed-size square area along the center as a training image; use the convolution module and convolution operation to capture Box predictions. The grabbing frame position and size information are directly regressed; the grabbing quality score also uses the direct regression method, but sigmoid is performed in the final output, and the output range of the predicted grabbing quality score is controlled within 0-1. , the grasping confidence can be well represented; the angle is predicted by classification.
CN202110207871.0A 2021-02-25 2021-02-25 Plane grabbing detection method based on computer vision and deep learning Active CN112906797B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110207871.0A CN112906797B (en) 2021-02-25 2021-02-25 Plane grabbing detection method based on computer vision and deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110207871.0A CN112906797B (en) 2021-02-25 2021-02-25 Plane grabbing detection method based on computer vision and deep learning

Publications (2)

Publication Number Publication Date
CN112906797A true CN112906797A (en) 2021-06-04
CN112906797B CN112906797B (en) 2024-01-12

Family

ID=76108019

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110207871.0A Active CN112906797B (en) 2021-02-25 2021-02-25 Plane grabbing detection method based on computer vision and deep learning

Country Status (1)

Country Link
CN (1) CN112906797B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113658259A (en) * 2021-08-30 2021-11-16 武汉科技大学 Automatic grabbing method of cold rolling mill roller based on visual positioning
CN113781538A (en) * 2021-07-27 2021-12-10 武汉中海庭数据技术有限公司 Image depth information fusion method and system, electronic equipment and storage medium
CN113808205A (en) * 2021-08-31 2021-12-17 华南理工大学 A Fast Dynamic Object Grasping Method Based on Detection Constraints
CN114012722A (en) * 2021-11-01 2022-02-08 苏州科德软体电路板有限公司 Mechanical arm target grabbing method based on deep learning and edge detection
CN114049318A (en) * 2021-11-03 2022-02-15 重庆理工大学 Multi-mode fusion feature-based grabbing pose detection method
CN114193446A (en) * 2021-11-22 2022-03-18 上海交通大学宁波人工智能研究院 Closed loop capture detection method based on morphological image processing
CN114358136A (en) * 2021-12-10 2022-04-15 鹏城实验室 Image data processing method and device, intelligent terminal and storage medium
CN114820796A (en) * 2022-05-16 2022-07-29 中国科学技术大学 Visual capture detection method and system based on self-supervision representation learning
CN114998573A (en) * 2022-04-22 2022-09-02 北京航空航天大学 A Grasping Pose Detection Method Based on RGB-D Feature Deep Fusion
CN115436081A (en) * 2022-08-23 2022-12-06 中国人民解放军63653部队 Simulation device for scattered pollutants and target pickup performance test method
CN115972198A (en) * 2022-12-05 2023-04-18 无锡宇辉信息技术有限公司 Mechanical arm visual grabbing method and device under incomplete information condition
WO2023165361A1 (en) * 2022-03-02 2023-09-07 华为技术有限公司 Data processing method and related device
CN116852354A (en) * 2023-06-27 2023-10-10 山东新一代信息产业技术研究院有限公司 A robotic arm grabbing detection method based on improved Cascade R-CNN network
CN117681211A (en) * 2024-01-23 2024-03-12 哈尔滨工业大学 Deep learning-based two-finger underactuated mechanical gripper grabbing pose detection method
CN120014057A (en) * 2025-04-22 2025-05-16 江苏省特种设备安全监督检验研究院 Precise positioning method for three-dimensional space of suspended object

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050151495A1 (en) * 2003-11-25 2005-07-14 Jidosha Denki Kogyo Co., Ltd. Back door opening and closing apparatus
CN107204025A (en) * 2017-04-18 2017-09-26 华北电力大学 The adaptive clothing cartoon modeling method that view-based access control model is perceived
CN110211180A (en) * 2019-05-16 2019-09-06 西安理工大学 A kind of autonomous grasping means of mechanical arm based on deep learning
CN111428815A (en) * 2020-04-16 2020-07-17 重庆理工大学 Mechanical arm grabbing detection method based on Anchor angle mechanism
CN111523486A (en) * 2020-04-24 2020-08-11 重庆理工大学 A Grab Detection Method for Robot Arm Based on Improved CenterNet
CN111695562A (en) * 2020-05-26 2020-09-22 浙江工业大学 Autonomous robot grabbing method based on convolutional neural network
CN111723782A (en) * 2020-07-28 2020-09-29 北京印刷学院 Visual robot grasping method and system based on deep learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050151495A1 (en) * 2003-11-25 2005-07-14 Jidosha Denki Kogyo Co., Ltd. Back door opening and closing apparatus
CN107204025A (en) * 2017-04-18 2017-09-26 华北电力大学 The adaptive clothing cartoon modeling method that view-based access control model is perceived
CN110211180A (en) * 2019-05-16 2019-09-06 西安理工大学 A kind of autonomous grasping means of mechanical arm based on deep learning
CN111428815A (en) * 2020-04-16 2020-07-17 重庆理工大学 Mechanical arm grabbing detection method based on Anchor angle mechanism
CN111523486A (en) * 2020-04-24 2020-08-11 重庆理工大学 A Grab Detection Method for Robot Arm Based on Improved CenterNet
CN111695562A (en) * 2020-05-26 2020-09-22 浙江工业大学 Autonomous robot grabbing method based on convolutional neural network
CN111723782A (en) * 2020-07-28 2020-09-29 北京印刷学院 Visual robot grasping method and system based on deep learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GUOLIANG ZHANG: ""Object Detection and Grabbing Based on Machine Vision for Service Robot"", 《 2018 IEEE 9TH ANNUAL INFORMATION TECHNOLOGY, ELECTRONICS AND MOBILE COMMUNICATION CONFERENCE》 *
张凯宇: ""基于RGB-D图像的机械臂抓取位姿检测"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
杜学丹;蔡莹皓;鲁涛;王硕;闫哲;: "一种基于深度学习的机械臂抓取方法", 机器人, no. 06 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113781538A (en) * 2021-07-27 2021-12-10 武汉中海庭数据技术有限公司 Image depth information fusion method and system, electronic equipment and storage medium
CN113781538B (en) * 2021-07-27 2024-02-13 武汉中海庭数据技术有限公司 Image depth information fusion method, system, electronic equipment and storage medium
CN113658259A (en) * 2021-08-30 2021-11-16 武汉科技大学 Automatic grabbing method of cold rolling mill roller based on visual positioning
CN113808205A (en) * 2021-08-31 2021-12-17 华南理工大学 A Fast Dynamic Object Grasping Method Based on Detection Constraints
CN113808205B (en) * 2021-08-31 2023-07-18 华南理工大学 A Fast Dynamic Object Grasping Method Based on Detection Constraints
CN114012722A (en) * 2021-11-01 2022-02-08 苏州科德软体电路板有限公司 Mechanical arm target grabbing method based on deep learning and edge detection
CN114049318A (en) * 2021-11-03 2022-02-15 重庆理工大学 Multi-mode fusion feature-based grabbing pose detection method
CN114049318B (en) * 2021-11-03 2025-03-04 重庆理工大学 A grasping posture detection method based on multimodal fusion features
CN114193446B (en) * 2021-11-22 2023-04-25 上海交通大学宁波人工智能研究院 Closed loop grabbing detection method based on morphological image processing
CN114193446A (en) * 2021-11-22 2022-03-18 上海交通大学宁波人工智能研究院 Closed loop capture detection method based on morphological image processing
CN114358136A (en) * 2021-12-10 2022-04-15 鹏城实验室 Image data processing method and device, intelligent terminal and storage medium
CN114358136B (en) * 2021-12-10 2024-05-17 鹏城实验室 Image data processing method and device, intelligent terminal and storage medium
EP4475013A4 (en) * 2022-03-02 2025-05-07 Huawei Technologies Co., Ltd. Data processing method and related device
WO2023165361A1 (en) * 2022-03-02 2023-09-07 华为技术有限公司 Data processing method and related device
CN114998573A (en) * 2022-04-22 2022-09-02 北京航空航天大学 A Grasping Pose Detection Method Based on RGB-D Feature Deep Fusion
CN114998573B (en) * 2022-04-22 2024-05-14 北京航空航天大学 Grabbing pose detection method based on RGB-D feature depth fusion
CN114820796B (en) * 2022-05-16 2024-11-05 中国科学技术大学 Visual grasping detection method and system based on self-supervised representation learning
CN114820796A (en) * 2022-05-16 2022-07-29 中国科学技术大学 Visual capture detection method and system based on self-supervision representation learning
CN115436081B (en) * 2022-08-23 2023-10-10 中国人民解放军63653部队 Target pickup performance test method
CN115436081A (en) * 2022-08-23 2022-12-06 中国人民解放军63653部队 Simulation device for scattered pollutants and target pickup performance test method
CN115972198A (en) * 2022-12-05 2023-04-18 无锡宇辉信息技术有限公司 Mechanical arm visual grabbing method and device under incomplete information condition
CN115972198B (en) * 2022-12-05 2023-10-10 无锡宇辉信息技术有限公司 Mechanical arm visual grabbing method and device under incomplete information condition
CN116852354A (en) * 2023-06-27 2023-10-10 山东新一代信息产业技术研究院有限公司 A robotic arm grabbing detection method based on improved Cascade R-CNN network
CN117681211A (en) * 2024-01-23 2024-03-12 哈尔滨工业大学 Deep learning-based two-finger underactuated mechanical gripper grabbing pose detection method
CN120014057A (en) * 2025-04-22 2025-05-16 江苏省特种设备安全监督检验研究院 Precise positioning method for three-dimensional space of suspended object

Also Published As

Publication number Publication date
CN112906797B (en) 2024-01-12

Similar Documents

Publication Publication Date Title
CN112906797A (en) Plane grabbing detection method based on computer vision and deep learning
CN113524194B (en) Target grabbing method of robot vision grabbing system based on multi-mode feature deep learning
CN109702741B (en) Robotic arm visual grasping system and method based on self-supervised learning neural network
CN115816460B (en) Mechanical arm grabbing method based on deep learning target detection and image segmentation
CN107813310B (en) A multi-gesture robot control method based on binocular vision
CN109255813B (en) Man-machine cooperation oriented hand-held object pose real-time detection method
CN114952809B (en) Workpiece recognition and pose detection method, system, and grasping control method of a robotic arm
CN110363815A (en) A robot grasp detection method based on instance segmentation under single-view point cloud
CN108280856A (en) The unknown object that network model is inputted based on mixed information captures position and orientation estimation method
CN115861999B (en) A robot grasping detection method based on multimodal visual information fusion
CN111199556B (en) Camera-based indoor pedestrian detection and tracking method
CN112509063A (en) Mechanical arm grabbing system and method based on edge feature matching
CN110378325B (en) Target pose identification method in robot grabbing process
CN101587591B (en) Vision Accurate Tracking Method Based on Two-parameter Threshold Segmentation
CN113808205B (en) A Fast Dynamic Object Grasping Method Based on Detection Constraints
CN114882109A (en) Robot grabbing detection method and system for sheltering and disordered scenes
CN112926503B (en) A Method for Automatically Generating Grabbing Datasets Based on Rectangle Fitting
CN115861780B (en) A YOLO-GGCNN-based robotic arm detection and grasping method
CN110992422B (en) Medicine box posture estimation method based on 3D vision
CN112975957B (en) Target extraction method, system, robot and storage medium
CN114714365A (en) Disordered workpiece grabbing method and system based on cloud platform
CN116249607A (en) Method and device for robotically gripping three-dimensional objects
CN114998573B (en) Grabbing pose detection method based on RGB-D feature depth fusion
CN116984269A (en) Gangue grabbing method and system based on image recognition
CN115578460A (en) Robot Grasping Method and System Based on Multimodal Feature Extraction and Dense Prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant