[go: up one dir, main page]

CN110238840B - Mechanical arm autonomous grabbing method based on vision - Google Patents

Mechanical arm autonomous grabbing method based on vision Download PDF

Info

Publication number
CN110238840B
CN110238840B CN201910335507.5A CN201910335507A CN110238840B CN 110238840 B CN110238840 B CN 110238840B CN 201910335507 A CN201910335507 A CN 201910335507A CN 110238840 B CN110238840 B CN 110238840B
Authority
CN
China
Prior art keywords
grasping
image
grabbing
label
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910335507.5A
Other languages
Chinese (zh)
Other versions
CN110238840A (en
Inventor
成慧
蔡俊浩
苏竟成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201910335507.5A priority Critical patent/CN110238840B/en
Publication of CN110238840A publication Critical patent/CN110238840A/en
Application granted granted Critical
Publication of CN110238840B publication Critical patent/CN110238840B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1602Programme controls characterised by the control system, structure, architecture
    • B25J9/161Hardware, e.g. neural networks, fuzzy logic, interfaces, processor
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1694Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • B25J9/1697Vision controlled systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Automation & Control Theory (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本发明涉及机器人技术领域,更具体地,涉及一种基于视觉的机械臂自主抓取方法。提出了基于对抗抓取规则的纠正抓取策略,利用该策略可以实现在仿真平台进行试错抓取得到符合该规则的抓取样本。利用该方法采集的样本清晰的表达了对抗抓取规则的抓取模式,有利于模型的学习。整个数据采集过程无需人工干预,也无需任何真实数据,避免了真实数据采集可能带来的问题。只需要少量该方法采集的仿真数据,训练后的模型可以直接应用到不同的真实的抓取场景中。整个训练过程无需域自适应和域随机化操作,且准确率和鲁棒性高。

Figure 201910335507

The invention relates to the field of robotics technology, and more particularly, to a vision-based autonomous grasping method for a robotic arm. A corrective grasping strategy based on adversarial grasping rules is proposed, which can be used to achieve trial-and-error grasping on the simulation platform to obtain grasping samples that conform to the rules. The samples collected by this method clearly express the grasping mode against grasping rules, which is beneficial to the learning of the model. The entire data collection process does not require manual intervention, nor does it require any real data, which avoids possible problems caused by real data collection. Only a small amount of simulation data collected by this method is needed, and the trained model can be directly applied to different real grasping scenarios. The entire training process does not require domain adaptation and domain randomization, and has high accuracy and robustness.

Figure 201910335507

Description

Mechanical arm autonomous grabbing method based on vision
Technical Field
The invention relates to the technical field of robots, in particular to a mechanical arm automatic grabbing method based on vision.
Background
Robot grabbing is mainly divided into two directions of an analysis method and an experience method. The analysis method generally refers to the construction of force closure grabbing based on the rules defined by four attributes of flexibility, balance, stability and dynamic certainty. This approach can generally be built into a constrained optimization problem. The empirical method is a data-driven method, and generally refers to extracting the feature representation of an object based on data, and then implementing a grabbing decision by using a set grabbing heuristic rule.
As deep learning has made tremendous progress in the field of computer vision, it has also begun to gain extensive attention and research in the field of robotics. A Learning to Grasp from 50K Tries and 700Robot homes. A robot trial-and-error grabbing mode is utilized to collect a 50000 grabbing data set, and a deep neural network is trained to realize the decision of grabbing angles. This method has 73% accuracy for unseen objects. Levine et at, left Hand-editing for a Robotic grading with Deep Learning and Large-Scale Data Collection. 800000 captured datasets were collected over two months using 6-14 robots and an evaluation model was trained using the datasets. The model can evaluate the action command according to the current scene to find out the optimal action command. This method can achieve 80% capture accuracy.
The methods have high capturing success rate, but the robots are required to capture and trial and error to acquire data. This approach is time and labor consuming and presents a significant safety hazard.
Mahler et al Dex-Net 2.0 Deep Learning to plant Robust scales with Synthetic Point clocks and analytical Grasp meters. And sampling object grabbing points in the simulation platform based on the anti-grabbing rule, and then obtaining sampling points with high robustness in a force closing mode. Based on the data obtained in the mode, a grabbing quality evaluation neural network is trained, and the grabbing success rate of the method on the countermeasures can be up to 93%. Although the method can have higher accuracy, the data size required by the training model is very large, and one reason of the method is that the acquired sample data does not clearly reflect the defined capture mode.
Disclosure of Invention
The invention provides a vision-based mechanical arm autonomous grasping method for overcoming at least one defect in the prior art, and data grasped in a simulation platform by the method is favorable for model learning.
In order to solve the technical problems, the invention adopts the technical scheme that: a mechanical arm autonomous grabbing method based on vision comprises the following steps:
s1, in a simulation environment, building an environment similar to a real scene, and collecting a global image;
s2, processing the data, wherein the preprocessed data comprise: the system comprises a global image containing the information of the whole working space, an object mask and a label graph with the same scale as the global image; the treatment process comprises the following steps: firstly, generating an object mask according to a position set of pixels where an object is located in an image, then generating a label mask according to the object mask, a capture pixel position and a capture label, and generating a label graph by using the capture position and the capture label; then discretizing the grabbing angle according to the grabbing problem definition;
s3, training a deep neural network:
(1) normalizing the input RGB images, and then synthesizing a batch;
(2) transmitting the batch of data into a full convolution neural network to obtain an output value;
(3) calculating the error between the predicted value and the label according to the cross entropy error combined with the label mask, and calculating by the following loss function:
Figure GDA0002683904630000021
wherein Y is a label image, M is a label mask, H and W are respectively the length and width of the label image, i, j and k are respectively index subscripts of positions in the 3-channel image, l is an index of the number of channels,
Figure GDA0002683904630000022
an output characteristic diagram representing the last convolutional layer;
Figure GDA0002683904630000023
representing a real number domain, wherein the corresponding superscript represents the dimension of the tensor;
and S4, applying the trained model to a real grabbing environment.
The invention provides a mechanical arm autonomous grabbing method based on vision, which trains an end-to-end deep neural network capable of realizing pixel-level grabbing prediction by acquiring a small amount of grabbing data in a simulation environment, and a learned model can be directly applied to a real grabbing scene. The whole process does not need to use domain self-adaption and domain randomization operation, and does not need any data collected by a real environment.
Further, the step S1 specifically includes:
s11, placing a background texture, a mechanical arm with a gripper, a camera and an object to be grabbed in a working space of a simulation environment;
s12, placing an object in a working space, selecting a position where the object exists by using a camera, recording image information, a pixel position corresponding to a grabbing point, a mask of the object in the image and a grabbing angle, and then randomly selecting an angle to allow the mechanical arm to perform trial-and-error grabbing;
s13, judging whether the grabbing is successful, and if the grabbing is failed, directly storing the image I, the position set C of the pixel where the object is located in the image, the pixel position p corresponding to the grabbing point, the grabbing angle psi and the grabbing failed label l; if the grabbing is successful, the global image I 'and the corresponding position set C' of the pixel where the object is located in the image are recorded again, and then the image I ', the position set C' of the pixel where the object is located in the image, the pixel position p corresponding to the grabbing point, the grabbing angle psi and the label l which is successfully grabbed are stored.
Further, the definition of the grabbing problem comprises: defining the vertical plane grasp as g ═ (p, ω, η), where p ═ x, y, z denotes the position of the grasp point in cartesian coordinates, ω ∈ [0,2 π) denotes the rotation angle of the terminal,
Figure GDA0002683904630000031
is a 3-dimensional one-bit effective code used for representing the grabbing function; the grabbing function is divided into three types, namely, grippable function, non-grippable function and background function; when projected into image space, capture at image I may be represented as
Figure GDA0002683904630000032
Wherein
Figure GDA0002683904630000033
Indicating the position of the grasp in the image,
Figure GDA0002683904630000034
representing a discrete grabbing angle; each pixel in the image may define a capture function, so the entire capture function graph may be represented as:
Figure GDA0002683904630000035
wherein
Figure GDA0002683904630000036
A capture function graph for an image at a given ith angle; in the figure, 3 channels respectively represent three categories of graspable, non-graspable and background; from each grab function graph CiIn the first channel
Figure GDA0002683904630000037
And are combined together to form
Figure GDA0002683904630000038
Figure GDA0002683904630000039
Representing the real domain, the corresponding superscript represents the dimension of the tensor.
Further, the most robust grab point is obtained by solving the following formula:
i*,h*,w*=argmaxi,h,wG(i,h,w)
where G (i, h, w) represents the confidence of the graspable function in the rotation angle and image position. (h)*,w*) For the position to be reached by the robot arm terminal in image space, i*Indicating terminal rotation
Figure GDA00026839046300000310
And then grabbing is performed.
Further, during the training process, a parameterized equation f is definedθAnd realizing the mapping of the image to the pixel level of the grabbing function graph, wherein the mapping can be expressed as:
Figure GDA00026839046300000311
in the formula (I), the compound is shown in the specification,
Figure GDA00026839046300000312
for image I rotate
Figure GDA00026839046300000313
The image after the degree of the image is,
Figure GDA00026839046300000314
is composed of
Figure GDA00026839046300000315
A corresponding grabbing function diagram; f. ofθImplemented with a deep neural network; in conjunction with the loss function, the overall training objective may be defined by the following equation:
Figure GDA0002683904630000041
wherein
Figure GDA0002683904630000042
A label graph is shown.
Further, considering a scene in which only one object is placed in the working space, c1 and c2 are defined as contact points of two fingers of the gripper and the object, n1 and n2 are corresponding normal vectors thereof, and g is defined as a grabbing direction of the gripper in the image space, wherein c1, c2, n1, n2,
Figure GDA0002683904630000043
By the above definition, it is possible to obtain:
Figure GDA0002683904630000044
wherein, | | · | | represents norm operation;
defining a fetch operation as a resistive fetch, when it satisfies the following condition:
Figure GDA0002683904630000045
Figure GDA0002683904630000046
Figure GDA0002683904630000047
wherein theta is1And theta2The non-negative values tend to be 0 and pi respectively, and represent the gripping direction and the threshold value of an included angle between the normal vectors of the surfaces of two contact points contacted with the object; wherein ω is1And ω2Two contact points for gripping direction and contact with objectThe included angle of the surface normal vector; when the gripper gripping direction is parallel to the normal vector of the contact point, the grip is defined as a stable counter grip.
All data are collected in the simulation platform, any real data are not needed, and the problem possibly caused by collecting the data in the real environment is avoided. In the simulation platform, the confrontation grabbing rules are added, so that the acquired data can effectively reflect the corresponding grabbing mode, and the trained model can be directly applied to a real grabbing scene only by a very small amount of grabbing data. The invention realizes the capture function prediction of the end-to-end pixel level by using the full-volume machine neural network. Each output pixel can capture the global information of the input image, which enables the model to learn more efficiently and make more accurate predictions.
Further, the step S4 includes:
s41, acquiring an RGB (red, green and blue) image and a depth image of a working space by using a camera;
s42, carrying out normalization processing on the RGB images, and rotating the RGB images by 16 angles to transmit the RGB images into the model to obtain 16 capture function images;
s43, according to the definition of the grabbing problem, the first channel of each functional diagram is taken and combined, the position corresponding to the maximum value is obtained, and the optimal grabbing position and grabbing angle in the image space can be obtained;
and S44, mapping the obtained image position to a 3-dimensional space, solving a mechanical arm control command according to inverse kinematics, rotating the tail end according to the grabbing angle after the image position reaches the position right above the object, and judging the descending height of the mechanical arm according to the collected depth map to avoid collision.
Further, the step S42 specifically includes: the image input into the full convolution neural network model comprises a global image of the whole space, firstly, Resnet50 is used as an encoder to extract features, then, a four-layer bilinear difference value and convolution sampling module is used, and finally, a 5x5 convolution is used for obtaining an input through-scale grabbing function diagram.
Compared with the prior art, the beneficial effects are:
1. the invention provides a corrective grabbing strategy based on an antagonistic grabbing rule, and trial and error grabbing can be carried out on a simulation platform by utilizing the strategy to obtain a grabbing sample according with the rule. The sample acquired by the method clearly expresses the capture mode of the anti-capture rule, and is beneficial to the learning of the model. The whole data acquisition process does not need manual intervention or any real data, and the problem possibly brought by real data acquisition is avoided.
2. Only a small amount of simulation data acquired by the method is needed, and the trained model can be directly applied to different real capturing scenes. The whole training process does not need domain self-adaptation and domain randomization operation, and the accuracy and the robustness are high.
3. A full convolution depth neural network is designed, the network inputs images containing the information of the whole working space, and the capturing function of each pixel point is output and predicted. The network structure of global input and pixel level prediction can learn corresponding grabbing modes faster and better.
Drawings
FIG. 1 is a diagram illustrating parameters defined in anti-snatching rules in a simulator according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of a full convolution neural network of the present invention.
Detailed Description
The drawings are for illustration purposes only and are not to be construed as limiting the invention; for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the invention.
Example 1:
defining the grabbing problem: defining the vertical plane grasp as g ═ (p, ω, η), where p ═ x, y, z denotes the position of the grasp point in cartesian coordinates, ω ∈ [0,2 π) denotes the rotation angle of the terminal,
Figure GDA0002683904630000061
is a 3-dimensional one-bit efficient code used to represent the grab function. The grabbing function is divided into three types, i.e. grippable function, non-grippable function and background function. When projected into image space, capture at image I may be represented as
Figure GDA0002683904630000062
Wherein
Figure GDA0002683904630000063
Indicating the position of the grasp in the image,
Figure GDA0002683904630000064
representing discrete grasping angles. Discretization can reduce the complexity of the learning process. Thus, each pixel in the image may define a capture function, so the entire capture function graph may be represented as:
Figure GDA0002683904630000065
wherein
Figure GDA0002683904630000066
The capture function graph of the image at the given ith angle is obtained. In the figure, 3 channels respectively represent three categories of graspable, non-graspable and background. From each grab function graph CiIn the first channel
Figure GDA0002683904630000067
(i.e., snatchable functional channel) and combined together
Figure GDA0002683904630000068
Figure GDA0002683904630000069
Thus, the most robust grab point can be obtained by solving the following equation:
i*,h*,w*=argmaxi,h,wG(i,h,w)
wherein G (i, h, w) represents the rotation angle and the image positionConfidence of the grab function. (h)*,w*) For the position to be reached by the robot arm terminal in image space, i*Indicating terminal rotation
Figure GDA00026839046300000610
And then grabbing is performed.
During the training process, a parameterized equation f is definedθAnd realizing the mapping of the image to the pixel level of the grabbing function graph, wherein the mapping can be expressed as:
Figure GDA00026839046300000611
Figure GDA00026839046300000612
for image I rotate
Figure GDA00026839046300000613
The image after the degree of the image is,
Figure GDA00026839046300000614
is composed of
Figure GDA00026839046300000615
And (5) corresponding grabbing function diagrams.
fθMay be implemented with a deep neural network; for example, learning is performed by using a gradient descent method to obtain an expression of a function, data is input into a neural network to obtain a prediction output, the prediction output is compared with a real label to obtain an error, the error is propagated backwards to obtain a gradient of each parameter in the neural network, and finally the parameters are updated by using the gradients to make the output of the neural network closer to the real label, so that a specific expression of the function is obtained by learning.
In conjunction with the loss function, the overall training objective may be defined by the following equation:
Figure GDA00026839046300000616
wherein
Figure GDA00026839046300000617
A label graph is shown.
Collecting simulation data: the invention defines the rule of object fighting grabbing in the image space. Consider a scene in which only one object is placed in the workspace. C1 and c2 are defined as the contact points of the two fingers and the object, n1 and n2 are their corresponding normal vectors, and g is the direction of capture of the gripper in image space. c1, c2, n1, n2,
Figure GDA0002683904630000071
As shown in fig. 1; by the above definition, it is possible to obtain:
Figure GDA0002683904630000072
| | · | | represents a norm operation. The invention defines a grabbing operation as a counter-grabbing when it satisfies the following condition:
Figure GDA0002683904630000073
Figure GDA0002683904630000074
Figure GDA0002683904630000075
wherein theta is1And theta2The non-negative values tend to be 0 and pi respectively, and represent the gripping direction and the threshold value of an included angle between the normal vectors of the surfaces of two contact points contacted with the object; omega1And ω2The included angle between the grabbing direction and the normal vector of the surfaces of two contact points contacted with the object is shown. In general, a gripper is defined as a stable pair when the direction of the gripper's grasp is parallel to the normal vector of the point of contactAnd (4) resisting grabbing.
In a practical implementation, the present invention uses a corrective grab strategy to achieve the collection of samples that satisfy the resistive grab rule. Firstly, a grabbing angle and a pixel position containing an object are randomly selected, and then the information of the whole working space is recorded by a camera. And then controlling the mechanical arm to perform trial and error grabbing, and if grabbing fails, storing the working space image I, the grabbing pixel position p, the set C of all pixel positions occupied by the object in the image, the grabbing angle psi and the label l. If the grabbing is successful, the position of the object is changed due to the contact of the gripper and the object, the grabbing direction of the gripper is approximately parallel to the normal vector of the contact point due to the correction change, the requirement for resisting the grabbing rule is met, at the moment, the camera records the corrected image I 'again, the pixel position C' of the object is obtained again, and then the image, the pixel position of the grabbing point, the set of all pixel positions occupied by the object in the image, the grabbing angle and the label are stored.
Defining a network structure:
the network structure is shown in fig. 2. The method adopts a full-volume machine neural network, inputs the global image containing the whole working space, firstly uses Resnet50 as an encoder to extract features, then uses an up-sampling module with four layers of bilinear interpolation and convolution, and optimally uses a 5x5 convolution to obtain a grabbing function graph with the input through scale.
Defining a loss function:
because most pixels in an image belong to the background class and the graspable and non-graspable labels are very sparse, training directly with such data can be very inefficient. The invention therefore proposes to calculate the loss function in combination with a label mask. For pixels belonging to the object but not subjected to trial-and-error capture, the value of the pixel at the position corresponding to the label mask is set as
Figure GDA0002683904630000081
For other pixels, the value of the position corresponding to the label mask is set as
Figure GDA0002683904630000082
Is provided with
Figure GDA0002683904630000083
The output characteristic diagram of the last convolutional layer is shown. The corresponding loss function is therefore:
Figure GDA0002683904630000084
Figure GDA0002683904630000085
indicating the label graph corresponding to the sample, H and W are the length and width of the label graph respectively, i, j and k are index subscripts of the position in the 3-channel image respectively, l is the index of the channel number,
Figure GDA0002683904630000086
an output characteristic diagram representing the last convolutional layer;
Figure GDA0002683904630000087
representing the real domain, the corresponding superscript represents the dimension of the tensor.
In order to reduce the influence caused by label sparsity, the invention increases the loss weight of grippable and non-grippable, and reduces the loss weight of the background. For both grippable and non-grippable labels, the position of their mask is multiplied by 120, while the background area is multiplied by 0.1.
The method comprises the following specific implementation steps:
step 1: in the simulation environment, an environment similar to a real scene is built.
Step 1.1: a background texture, a mechanical arm with a gripper, a camera and an object to be grabbed are placed in a working space of the simulation environment.
Step 1.2: the method comprises the steps of placing an object in a working space, selecting a position where the object exists by using a camera, recording image information, a pixel position corresponding to a grabbing point, a mask of the object in an image and a grabbing angle, and then randomly selecting an angle to enable a mechanical arm to perform trial-and-error grabbing.
Step 1.3: and judging whether the grabbing is successful, if the grabbing is failed, directly storing the image I, the position set C of the pixel where the object is located in the image, the pixel position p corresponding to the grabbing point, the grabbing angle psi and the label l of the grabbing failure. If the grabbing is successful, the global image I 'and the corresponding position set C' of the pixel where the object is located in the image are recorded again, and then the image I ', the position set C' of the pixel where the object is located in the image, the pixel position p corresponding to the grabbing point, the grabbing angle psi and the label l which is successfully grabbed are stored. The acquired global image is the global image defined by the grabbing problem in the invention content, and the grabbing angle and the grabbing position are also defined in the image space.
Step 2: the data is pre-processed.
Step 2.1: generating an object mask according to the position set of the pixel where the object is located in the image, generating a label mask according to the object mask, the pixel grabbing position and the label grabbing position, and generating a label image by using the grabbing position and the label grabbing position. For the label mask, the weights belonging to the grippable and non-grippable regions are increased, and the weight of the background is decreased.
Step 2.2: and discretizing the grabbing angle according to the problem definition. In this step, the image is rotated by 16 degrees, and the corresponding label and label mask are also rotated by 16 degrees, because only horizontal grabbing is considered, only data in which the grabbing direction is parallel to the horizontal direction after rotation is retained.
Step 2.3: the preprocessed data includes: the system comprises a global image containing the information of the whole working space, an object mask and a label graph with the same scale as the global image.
And step 3: and training the deep neural network.
Step 3.1: the input RGB maps are normalized and then a batch (batch) is synthesized.
Step 3.2: the batch of data is transmitted to a full convolution neural network defined in the invention content to obtain an output value.
Step 3.3: and calculating the error between the predicted value and the label according to the cross entropy error combined with the label mask, wherein the calculation formula is as follows:
Figure GDA0002683904630000091
wherein Y is a label image, M is a label mask, H and W are respectively the length and width of the label image, i, j and k are respectively index subscripts of positions in the 3-channel image, l is an index of the number of channels,
Figure GDA0002683904630000092
an output characteristic diagram representing the last convolutional layer;
Figure GDA0002683904630000093
representing the real domain, the corresponding superscript represents the dimension of the tensor.
And 4, step 4: and applying the trained model to a real grabbing environment.
Step 4.1: and acquiring an RGB (red, green and blue) map and a depth map of the working space by using the camera.
Step 4.2: and (3) carrying out normalization processing on the RGB image, and rotating 16 angles to transmit into the full convolution neural network model to obtain 16 capture function images.
Step 4.3: according to the definition of the grabbing problem, the first channel of each functional diagram is taken and combined, the position corresponding to the maximum value is obtained, and the optimal grabbing position and grabbing angle in the image space can be obtained.
Step 4.4: and mapping the obtained image position to a 3-dimensional space, solving a mechanical arm control command according to inverse kinematics, rotating the tail end according to a grabbing angle after the tail end reaches the position right above the object, and judging the descending height of the mechanical arm according to the acquired depth map to avoid collision.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (7)

1.一种基于视觉的机械臂自主抓取方法,其特征在于,包括以下步骤:1. a vision-based robotic arm autonomous grasping method, is characterized in that, comprises the following steps: S1.在仿真环境中,搭建一个类似于现实场景的环境,并采集全局图像;S1. In the simulation environment, build an environment similar to the real scene, and collect global images; S2.对数据进行处理,预处理后的数据包括:包含整个工作空间信息的全局图像、物体掩膜以及与全局图像相同尺度的标签图;处理过程包括:首先根据图像中物体所在像素的位置集合生成物体掩膜,再根据物体掩膜、抓取像素位置和抓取标签生成标签掩膜,以及用抓取位置和抓取标签生成标签图;然后根据抓取问题定义,对抓取角度进行离散化;S2. Process the data. The preprocessed data includes: a global image containing the information of the entire workspace, an object mask, and a label map of the same scale as the global image; the processing process includes: first, according to the location set of the pixels where the object is located in the image Generate an object mask, and then generate a label mask according to the object mask, grasping pixel position and grasping label, and generate a label map with the grasping position and grasping label; Then according to the definition of grasping problem, the grasping angle is discretized change; 所述的抓取问题定义包括:定义垂直平面抓取为g=(p,ω,η),其中p=(x,y,z)表示笛卡尔坐标下抓取点的位置,ω∈[0,2π)表示终端的旋转角度,
Figure FDA0002747242400000011
Figure FDA0002747242400000012
是一个3维的一位有效编码,用来表示抓取功能;抓取功能共分为可抓取、不可抓取和背景三种;当投影到图像空间中,抓取在图像I可以表示为
Figure FDA0002747242400000013
Figure FDA0002747242400000014
其中
Figure FDA0002747242400000015
表示图像中的抓取位置,
Figure FDA0002747242400000016
表示离散抓取角度;图像中的每一个像素都可以定义抓取功能,所以整个抓取功能图可以表示为:
Figure FDA0002747242400000017
其中
Figure FDA0002747242400000018
为给定第i个角度下图像的抓取功能图;该图中3个通道分别表示可抓取、不可抓取和背景三种类别;从每个抓取功能图Ci中抽出第一个通道
Figure FDA0002747242400000019
并结合在一起组成
Figure FDA00027472424000000110
Figure FDA00027472424000000111
The definition of grasping problem includes: defining vertical plane grasping as g=(p,ω,η), where p=(x,y,z) represents the position of grasping point in Cartesian coordinates, ω∈[0 ,2π) represents the rotation angle of the terminal,
Figure FDA0002747242400000011
Figure FDA0002747242400000012
It is a 3-dimensional one-bit effective code, used to represent the grabbing function; the grabbing function is divided into three types: graspable, non-grabbable and background; when projected into the image space, the grasping in the image I can be expressed as
Figure FDA0002747242400000013
Figure FDA0002747242400000014
in
Figure FDA0002747242400000015
represents the grab position in the image,
Figure FDA0002747242400000016
Represents discrete grab angles; each pixel in the image can define a grab function, so the entire grab function graph can be expressed as:
Figure FDA0002747242400000017
in
Figure FDA0002747242400000018
is the grasping function diagram of the image given the i-th angle; the three channels in this figure represent three categories of graspable, ungraspable and background respectively; extract the first one from each grasping function diagram C i aisle
Figure FDA0002747242400000019
and combine together to form
Figure FDA00027472424000000110
Figure FDA00027472424000000111
S3.训练深度神经网络:S3. Train a deep neural network: (1)将输入RGB图进行归一化,然后合成一个批;(1) Normalize the input RGB image, and then synthesize a batch; (2)将该批数据传入全卷积神经网络,得到输出值;(2) Passing the batch of data into the fully convolutional neural network to obtain the output value; (3)根据结合标签掩膜的交叉熵误差,计算预测值与标签的误差,通过如下损失函数计算:(3) Calculate the error between the predicted value and the label according to the cross-entropy error combined with the label mask, and calculate it by the following loss function:
Figure FDA00027472424000000112
Figure FDA00027472424000000112
其中Y为标签图,M为标签掩膜,H和W分别为标签图的长和宽,i、j和k分别为3通道图像中位置的索引下标,l为通道数的索引,
Figure FDA00027472424000000113
表示最后一层卷积层的输出特征图;
Figure FDA0002747242400000029
表示实数域,对应的上标表示张量的维度大小;
where Y is the label map, M is the label mask, H and W are the length and width of the label map, respectively, i, j and k are the index subscripts of the positions in the 3-channel image, and l is the index of the number of channels,
Figure FDA00027472424000000113
represents the output feature map of the last convolutional layer;
Figure FDA0002747242400000029
Represents the real number field, and the corresponding superscript represents the dimension size of the tensor;
S4.将训练好的模型应用到真实抓取环境中。S4. Apply the trained model to the real grasping environment.
2.根据权利要求1所述的一种基于视觉的机械臂自主抓取方法,其特征在于,所述的S1步骤具体包括:2. a kind of vision-based robotic arm autonomous grasping method according to claim 1, is characterized in that, described S1 step specifically comprises: S11.在仿真环境的工作空间放置一个背景纹理、带夹持器的机械臂、摄像头和待抓取物体;S11. Place a background texture, a robotic arm with a gripper, a camera and an object to be grasped in the workspace of the simulation environment; S12.将物体放置在工作空间中,利用摄像头选择一个存在物体的位置,记录图像信息、抓取点对应的像素位置、图像中物体的掩膜和抓取角度,然后随机选择一个角度让机械臂进行试错抓取;S12. Place the object in the workspace, use the camera to select a position where the object exists, record the image information, the pixel position corresponding to the grasping point, the mask of the object in the image and the grasping angle, and then randomly select an angle to let the robotic arm Do a trial-and-error crawl; S13.判断是否抓取成功,如果抓取失败,则直接保存图像I、图像中物体所在像素的位置集合C、抓取点对应的像素位置p、抓取角度ψ以及抓取失败的标签l;若抓取成功,则重新记录全局图像I′以及对应的图像中物体所在像素的位置集合C′,然后将图像I′、图像中物体所在像素的位置集合C′、抓取点对应的像素位置p、抓取角度ψ以及抓取成功的标签l保存下来。S13. determine whether the grabbing is successful, if grabbing fails, then directly save the image I, the position set C of the pixel where the object is located in the image, the pixel position p corresponding to the grabbing point, the grabbing angle ψ and the label l of the grabbing failure; If the capture is successful, re-record the global image I' and the position set C' of the pixel where the object is located in the corresponding image, and then record the image I', the position set C' of the pixel where the object is located in the image, and the pixel position corresponding to the grab point. p, the grasping angle ψ and the successfully grasped label l are saved. 3.根据权利要求2所述的一种基于视觉的机械臂自主抓取方法,其特征在于,最鲁棒的抓取点通过求解下式得到:3. a kind of vision-based robotic arm autonomous grasping method according to claim 2, is characterized in that, the most robust grasping point obtains by solving following formula: i*,h*,w*=argmaxi,h,wG(i,h,w)i * ,h * ,w * =argmax i,h,w G(i,h,w) 其中G(i,h,w)表示旋转角度和图像位置下可抓取功能的置信度;(h*,w*)为图像空间机械臂终端要到达的位置,i*表示终端转动
Figure FDA0002747242400000021
后再执行抓取。
where G(i,h,w) represents the confidence of the graspable function under the rotation angle and image position; (h * ,w * ) is the position to be reached by the robotic arm terminal in the image space, and i * represents the rotation of the terminal
Figure FDA0002747242400000021
Then perform the fetch.
4.根据权利要求3所述的一种基于视觉的机械臂自主抓取方法,其特征在于,在训练过程中,定义一个参数化方程fθ,实现图像到抓取功能图之间像素级别的映射,该映射可以表示为:4. A vision-based robotic arm autonomous grasping method according to claim 3, characterized in that, in the training process, a parameterized equation f θ is defined to achieve pixel-level image capture between the image and the grasping function map. map, which can be expressed as:
Figure FDA0002747242400000022
Figure FDA0002747242400000022
式中,
Figure FDA0002747242400000023
为图像I旋转了
Figure FDA0002747242400000024
度之后的图像,
Figure FDA0002747242400000025
Figure FDA0002747242400000026
对应的抓取功能图;fθ用深度神经网络来实现;结合损失函数,整个训练目标可以用如下公式定义:
In the formula,
Figure FDA0002747242400000023
rotated for image I
Figure FDA0002747242400000024
image after degrees,
Figure FDA0002747242400000025
for
Figure FDA0002747242400000026
Corresponding grasping function map; f θ is implemented by a deep neural network; combined with the loss function, the entire training target can be defined by the following formula:
Figure FDA0002747242400000027
Figure FDA0002747242400000027
其中
Figure FDA0002747242400000028
表示标签图。
in
Figure FDA0002747242400000028
Represents a label map.
5.根据权利要求4所述的一种基于视觉的机械臂自主抓取方法,其特征在于,考虑工作空间中只摆放一个物体的场景,定义c1和c2为二指夹持器和物体的接触点,n1和n2为其对应的法向量,g作为夹持器在图像空间中的抓取方向,其中,c1、c2、n1、n2、
Figure FDA0002747242400000031
通过上述定义,可得:
5. A vision-based robotic arm autonomous grasping method according to claim 4, characterized in that, considering a scene where only one object is placed in the workspace, c1 and c2 are defined as the difference between the two-finger gripper and the object. Contact points, n1 and n2 are their corresponding normal vectors, and g is the grasping direction of the gripper in the image space, where c1, c2, n1, n2,
Figure FDA0002747242400000031
From the above definition, we can get:
Figure FDA0002747242400000032
Figure FDA0002747242400000032
其中,||·||表示范数操作;Among them, || · || represents the norm operation; 定义一个抓取操作为对抗抓取,当其满足如下条件时为对抗抓取:Define a grab operation as an adversarial grab, which is an adversarial grab when it meets the following conditions:
Figure FDA0002747242400000033
Figure FDA0002747242400000033
Figure FDA0002747242400000034
Figure FDA0002747242400000034
Figure FDA0002747242400000035
Figure FDA0002747242400000035
其中θ1和θ2分别为趋于0和π的非负值,表示抓取方向和与物体接触的两个接触点表面法向量的夹角阈值;其中ω1和ω2为抓取方向和与物体接触的两个接触点表面法向量的夹角;当夹持器抓取方向与接触点的法向量平行时则定义该抓取为稳定的对抗抓取。where θ 1 and θ 2 are non-negative values tending to 0 and π, respectively, representing the threshold of the angle between the grasping direction and the surface normal vectors of the two contact points in contact with the object; where ω 1 and ω 2 are the grasping direction and the The angle between the surface normal vectors of the two contact points in contact with the object; when the gripping direction of the gripper is parallel to the normal vector of the contact point, the grip is defined as a stable confrontation grip.
6.根据权利要求5所述的一种基于视觉的机械臂自主抓取方法,其特征在于,所述的S4步骤包括:6. a kind of vision-based robotic arm autonomous grasping method according to claim 5, is characterized in that, described S4 step comprises: S41.利用摄像头获取工作空间的RGB图和深度图;S41. Use the camera to obtain the RGB map and depth map of the working space; S42.对RGB图进行归一化处理,并旋转16个角度传入全卷积神经网络模型,得到16个抓取功能图;S42. Normalize the RGB image, and rotate it by 16 angles to the fully convolutional neural network model to obtain 16 grasping function maps; S43.根据所述的抓取问题的定义,取每个功能图的第一个通道并将其结合起来,求其中的最大值对应的位置,可以得到图像空间中的最佳抓取位置以及抓取角度;S43. According to the definition of the grasping problem, take the first channel of each functional map and combine them, find the position corresponding to the maximum value, and obtain the best grasping position and grasping position in the image space. take the angle; S44.将求得的图像位置映射到3维空间中,再根据逆向运动学求解机械臂控制命令,到达物体的正上方之后根据抓取角度旋转末端,根据采集的深度图判断机械臂下降高度,避免出现碰撞。S44. Map the obtained image position into the 3-dimensional space, and then solve the control command of the manipulator according to inverse kinematics. After reaching directly above the object, rotate the end according to the grasping angle, and judge the drop height of the manipulator according to the collected depth map. Avoid collisions. 7.根据权利要求6所述的一种基于视觉的机械臂自主抓取方法,其特征在于,所述的S42步骤具体包括:输入全卷积神经网络模型的图像包含整个空间的全局图像,首先利用Resnet50作为编码器提取特征,然后使用四层双线性差值加卷积的上采模块,最后用一个5×5卷积得到输入通尺度的抓取功能图。7. a kind of vision-based robotic arm autonomous grasping method according to claim 6, is characterized in that, described S42 step specifically comprises: the image of inputting fully convolutional neural network model comprises the global image of the whole space, first Use Resnet50 as the encoder to extract features, then use a four-layer bilinear difference plus convolution upsampling module, and finally use a 5×5 convolution to obtain the input pass-scale capture function map.
CN201910335507.5A 2019-04-24 2019-04-24 Mechanical arm autonomous grabbing method based on vision Active CN110238840B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910335507.5A CN110238840B (en) 2019-04-24 2019-04-24 Mechanical arm autonomous grabbing method based on vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910335507.5A CN110238840B (en) 2019-04-24 2019-04-24 Mechanical arm autonomous grabbing method based on vision

Publications (2)

Publication Number Publication Date
CN110238840A CN110238840A (en) 2019-09-17
CN110238840B true CN110238840B (en) 2021-01-29

Family

ID=67883271

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910335507.5A Active CN110238840B (en) 2019-04-24 2019-04-24 Mechanical arm autonomous grabbing method based on vision

Country Status (1)

Country Link
CN (1) CN110238840B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110889460B (en) * 2019-12-06 2023-05-23 中山大学 Mechanical arm specified object grabbing method based on cooperative attention mechanism
CN111127548B (en) * 2019-12-25 2023-11-24 深圳市商汤科技有限公司 Grabbing position detection model training method, grabbing position detection method and grabbing position detection device
CN111325795B (en) * 2020-02-25 2023-07-25 深圳市商汤科技有限公司 Image processing method, device, storage medium and robot
CN111590577B (en) * 2020-05-19 2021-06-15 台州中盟联动企业管理合伙企业(有限合伙) Mechanical arm multi-parameter digital frequency conversion control method and device
CN112465825A (en) * 2021-02-02 2021-03-09 聚时科技(江苏)有限公司 Method for acquiring spatial position information of part based on image processing
CN116197887B (en) * 2021-11-28 2024-01-30 梅卡曼德(北京)机器人科技有限公司 Image data processing method, device, electronic equipment and storage medium for generating grabbing auxiliary image
CN114407011B (en) * 2022-01-05 2023-10-13 中科新松有限公司 Special-shaped workpiece grabbing planning method, planning device and special-shaped workpiece grabbing method

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7076313B2 (en) * 2003-06-06 2006-07-11 Visteon Global Technologies, Inc. Method for optimizing configuration of pick-and-place machine
KR101211601B1 (en) * 2010-11-05 2012-12-12 한국과학기술연구원 Motion Control System and Method for Grasping Object with Dual Arms of Robot
US8843236B2 (en) * 2012-03-15 2014-09-23 GM Global Technology Operations LLC Method and system for training a robot using human-assisted task demonstration
US20150314439A1 (en) * 2014-05-02 2015-11-05 Precision Machinery Research & Development Center End effector controlling method
EP3191264A4 (en) * 2014-09-12 2018-04-25 University of Washington Integration of auxiliary sensors with point cloud-based haptic rendering and virtual fixtures
CN109074513B (en) * 2016-03-03 2020-02-18 谷歌有限责任公司 Deep machine learning method and apparatus for robotic grasping
JP2018051704A (en) * 2016-09-29 2018-04-05 セイコーエプソン株式会社 Robot control device, robot, and robot system
CN106874914B (en) * 2017-01-12 2019-05-14 华南理工大学 A kind of industrial machinery arm visual spatial attention method based on depth convolutional neural networks
CN106846463B (en) * 2017-01-13 2020-02-18 清华大学 Three-dimensional reconstruction method and system of microscopic image based on deep learning neural network
CN106914897A (en) * 2017-03-31 2017-07-04 长安大学 Inverse Solution For Manipulator Kinematics method based on RBF neural
CN109407603B (en) * 2017-08-16 2020-03-06 北京猎户星空科技有限公司 Method and device for controlling mechanical arm to grab object
CN108161934B (en) * 2017-12-25 2020-06-09 清华大学 Method for realizing robot multi-axis hole assembly by utilizing deep reinforcement learning
CN108415254B (en) * 2018-03-12 2020-12-11 苏州大学 Control method of waste recycling robot based on deep Q network
CN109483534B (en) * 2018-11-08 2022-08-02 腾讯科技(深圳)有限公司 Object grabbing method, device and system

Also Published As

Publication number Publication date
CN110238840A (en) 2019-09-17

Similar Documents

Publication Publication Date Title
CN110238840B (en) Mechanical arm autonomous grabbing method based on vision
Cao et al. Suctionnet-1billion: A large-scale benchmark for suction grasping
EP3624999B1 (en) Machine learning methods and apparatus for robotic manipulation and that utilize multi-task domain adaptation
CN111079561A (en) A robot intelligent grasping method based on virtual training
CN112512755B (en) Robotic manipulation using domain-invariant 3D representations predicted from 2.5D visual data
CN109584298B (en) An online self-learning method for robotic autonomous object picking tasks
CN111331607B (en) Automatic grabbing and stacking method and system based on mechanical arm
CN111695562A (en) Autonomous robot grabbing method based on convolutional neural network
CN113826051A (en) Generating digital twins of interactions between solid system parts
CN111723782A (en) Visual robot grasping method and system based on deep learning
CN112295933B (en) A method for a robot to quickly sort goods
Tang et al. Learning collaborative pushing and grasping policies in dense clutter
CN113341706B (en) Man-machine cooperation assembly line system based on deep reinforcement learning
CN113752255B (en) Mechanical arm six-degree-of-freedom real-time grabbing method based on deep reinforcement learning
CN114131603B (en) Deep reinforcement learning robot grabbing method based on perception enhancement and scene migration
CN113172629A (en) Object grabbing method based on time sequence tactile data processing
CN110969660A (en) Robot feeding system based on three-dimensional stereoscopic vision and point cloud depth learning
CN108229678B (en) Network training method, operation control method, device, storage medium and equipment
CN116984269A (en) Gangue grabbing method and system based on image recognition
CN116600945A (en) Pixel-level prediction for grab generation
CN114952836A (en) Multi-fingered hand robot grasping method, device and robot system
CN113894058A (en) Quality detection and sorting method and system based on deep learning and storage medium
CN116664843A (en) Residual fitting grabbing detection network based on RGBD image and semantic segmentation
CN116214524A (en) Unmanned aerial vehicle capture method, device and storage medium for oil sample recovery
CN118990489A (en) Double-mechanical-arm cooperative carrying system based on deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant