CN106094516A - A kind of robot self-adapting grasping method based on deeply study - Google Patents
A kind of robot self-adapting grasping method based on deeply study Download PDFInfo
- Publication number
- CN106094516A CN106094516A CN201610402319.6A CN201610402319A CN106094516A CN 106094516 A CN106094516 A CN 106094516A CN 201610402319 A CN201610402319 A CN 201610402319A CN 106094516 A CN106094516 A CN 106094516A
- Authority
- CN
- China
- Prior art keywords
- target
- robot
- network
- reinforcement learning
- strategy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 230000002787 reinforcement Effects 0.000 claims abstract description 39
- 230000009467 reduction Effects 0.000 claims abstract description 18
- 238000011217 control strategy Methods 0.000 claims abstract description 16
- 230000003044 adaptive effect Effects 0.000 claims abstract description 14
- 238000000605 extraction Methods 0.000 claims abstract description 11
- 230000006870 function Effects 0.000 claims description 26
- 238000013528 artificial neural network Methods 0.000 claims description 23
- 230000009471 action Effects 0.000 claims description 13
- 238000012549 training Methods 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 12
- 238000003384 imaging method Methods 0.000 claims description 9
- 230000003287 optical effect Effects 0.000 claims description 9
- 230000007246 mechanism Effects 0.000 claims description 7
- 238000011156 evaluation Methods 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000007476 Maximum Likelihood Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 3
- 239000000523 sample Substances 0.000 claims description 3
- 239000012723 sample buffer Substances 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 3
- 238000005259 measurement Methods 0.000 abstract description 2
- 238000005516 engineering process Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Image Analysis (AREA)
- Manipulator (AREA)
Abstract
本发明提供了一种基于深度强化学习的机器人自适应抓取方法,步骤包括:在距离待抓取目标一定距离时,机器人通过前部的摄像头获取目标的照片,再根据照片利用双目测距方法计算出目标的位置信息,并将计算出的位置信息用于机器人导航;当目标进入机械手臂抓范围内时,再通过前部的摄像头拍摄目标的照片,并利用预先训练过的基于DDPG的深度强化学习网络对照片进行数据降维特征提取;根据特征提取结果得出机器人的控制策略,机器人利用控制策略来控制运动路径和机械手臂的位姿,从而实现目标的自适应抓取。该抓取方法能够对大小形状不同、位置不固定的物体实现自适应抓取,具有良好的市场应用前景。
The invention provides a robot adaptive grasping method based on deep reinforcement learning. The steps include: when the target is at a certain distance from the target to be grasped, the robot obtains a photo of the target through a front camera, and then uses binocular distance measurement according to the photo The method calculates the position information of the target, and uses the calculated position information for robot navigation; when the target enters the grasping range of the robot arm, it takes a photo of the target through the front camera, and uses the pre-trained DDPG-based The deep reinforcement learning network performs data dimensionality reduction feature extraction on photos; according to the feature extraction results, the robot's control strategy is obtained, and the robot uses the control strategy to control the motion path and the pose of the mechanical arm, so as to achieve adaptive grasping of the target. The grasping method can realize adaptive grasping for objects with different sizes and shapes and unfixed positions, and has a good market application prospect.
Description
技术领域technical field
本发明涉及一种机器人抓取物体的方法,尤其是一种基于深度强化学习的机器人自适应抓取方法。The invention relates to a method for a robot to grab an object, in particular to a robot adaptive grabbing method based on deep reinforcement learning.
背景技术Background technique
自主机器人是高度智能化的服务型机器人,具有对外界环境的学习功能。为了实现各种基本活动(如定位、移动、抓取)的功能,需要机器人配有机械手臂和机械手爪并融合多传感器的信息来进行机器学习(如深度学习和强化学习),与外界环境进行交互,实现其感知、决策和行动等各项功能。现在绝大多数抓取型机器人工作在待抓取物件大小、形状和位置相对固定的情况,并且抓取技术主要是基于超声波、红外和激光测距等传感器,因此使用范围很受限制,无法适应抓取环境更为复杂、抓取物件大小、形状和位置不固定的情况;目前,现有的视觉型机器人技术很难解决输入的视觉信息维度高、数据量大的“维数灾难”问题;并且,利用机器学习训练的神经网络也很难收敛,无法直接处理输入的图像信息。总体来说,现在的视觉型抓取服务机器人的控制技术尚未达到令人满意的结果,尤其在实用中还需要进一步优化。Autonomous robots are highly intelligent service robots that can learn from the external environment. In order to realize the functions of various basic activities (such as positioning, moving, and grasping), it is necessary for the robot to be equipped with a robotic arm and a robotic gripper and to fuse information from multiple sensors for machine learning (such as deep learning and reinforcement learning), and to communicate with the external environment. Interaction, realize its various functions such as perception, decision-making and action. At present, most grasping robots work in the situation where the size, shape and position of the objects to be grasped are relatively fixed, and the grasping technology is mainly based on sensors such as ultrasonic, infrared and laser ranging, so the scope of use is very limited and cannot adapt The grasping environment is more complex, and the size, shape and position of the grasped objects are not fixed; at present, the existing visual robot technology is difficult to solve the problem of "dimension disaster" of high-dimensional input visual information and large amount of data; Moreover, the neural network trained by machine learning is also difficult to converge, and cannot directly process the input image information. Generally speaking, the control technology of the current vision-based grasping service robot has not yet achieved satisfactory results, and further optimization is needed especially in practice.
发明内容Contents of the invention
本发明要解决的技术问题是现有的无法适应抓取环境更为复杂、抓取物件大小、形状和位置不固定的情况。The technical problem to be solved by the present invention is that the existing grasping environment is more complex and the grasping objects are not fixed in size, shape and position.
为了解决上述技术问题,本发明提供了一种基于深度强化学习的机器人自适应抓取方法,包括如下步骤:In order to solve the above technical problems, the present invention provides a robot adaptive grasping method based on deep reinforcement learning, comprising the following steps:
步骤1,在距离待抓取目标一定距离时,机器人通过前部的摄像头获取目标的照片,再根据照片利用双目测距方法计算出目标的位置信息,并将计算出的位置信息用于机器人导航;Step 1. At a certain distance from the target to be grasped, the robot obtains a photo of the target through the front camera, and then calculates the position information of the target using the binocular ranging method based on the photo, and uses the calculated position information for the robot navigation;
步骤2,机器人根据导航进行移动,当目标进入机械手臂抓范围内时,再通过前部的摄像头拍摄目标的照片,并利用预先训练过的基于DDPG的深度强化学习网络对照片进行数据降维特征提取;Step 2. The robot moves according to the navigation. When the target enters the grasping range of the robotic arm, it takes a photo of the target through the front camera, and uses the pre-trained DDPG-based deep reinforcement learning network to perform data dimensionality reduction on the photo. extract;
步骤3,根据特征提取结果得出机器人的控制策略,机器人利用控制策略来控制运动路径和机械手臂的位姿,从而实现目标的自适应抓取。In step 3, the control strategy of the robot is obtained according to the feature extraction results, and the robot uses the control strategy to control the motion path and the pose of the mechanical arm, so as to realize the adaptive grasping of the target.
作为本发明的进一步限定方案,步骤1中根据照片利用双目测距方法计算出目标的位置信息的具体步骤为:As a further limiting solution of the present invention, in step 1, the specific steps for calculating the position information of the target by using the binocular ranging method according to the photo are:
步骤1.1,获取摄像头的焦距f、左右两个摄像头的中心距Tx以及目标点在左右两个摄像头的像平面的投影点到各自像平面最左侧的物理距离xl和xr,左右两个摄像头对应的左侧的像平面和右侧的像平面均为矩形平面,且位于同一成像平面上,左右两个摄像头的光心投影分别位于相应像平面的中心处,则视差d为:Step 1.1, obtain the focal length f of the camera, the center distance T x of the left and right cameras, and the physical distance x l and x r from the projection point of the target point on the image plane of the left and right cameras to the leftmost side of the respective image plane, and the left and right two The image plane on the left and the image plane on the right corresponding to each camera are both rectangular planes, and they are located on the same imaging plane. The optical center projections of the left and right cameras are respectively located at the centers of the corresponding image planes, then the parallax d is:
d=xl-xr (1)d=x l -x r (1)
步骤1.2,利用三角形相似原理建立Q矩阵为:Step 1.2, using the triangular similarity principle to establish the Q matrix as:
式(2)和(3)中,(X,Y,Z)为目标点在以左摄像头光心为原点的立体坐标系中的坐标,W为旋转平移变换比例系数,(x,y)为目标点在左侧的像平面中的坐标,cx和cy分别为左侧的像平面和右侧的像平面的坐标系与立体坐标系中原点的偏移量,cx'为cx的修正值;In formulas (2) and (3), (X, Y, Z) are the coordinates of the target point in the three-dimensional coordinate system with the optical center of the left camera as the origin, W is the rotation and translation transformation scale coefficient, and (x, y) is The coordinates of the target point in the image plane on the left, c x and cy are the offsets between the coordinate system of the image plane on the left and the image plane on the right and the origin in the stereo coordinate system, c x ' is c x the correction value;
步骤1.3,计算得到目标点到成像平面的空间距离为:In step 1.3, the calculated spatial distance from the target point to the imaging plane is:
将左摄像头的光心所在位置作为机器人所在位置,将目标点的坐标位置信息(X,Y,Z)作为导航目的地进行机器人导航。The position of the optical center of the left camera is used as the position of the robot, and the coordinate position information (X, Y, Z) of the target point is used as the navigation destination for robot navigation.
作为本发明的进一步限定方案,步骤2中利用预先训练过的基于DDPG的深度强化学习网络对照片进行数据降维特征提取的具体步骤为:As a further limiting solution of the present invention, in step 2, the specific steps of using the pre-trained deep reinforcement learning network based on DDPG to extract data dimensionality reduction features from photos are:
步骤2.1,利用目标抓取过程符合强化学习且满足马尔科夫性质的条件,计算t时刻之前的观察量和动作的集合为:Step 2.1, using the condition that the target grasping process conforms to reinforcement learning and satisfies the Markov property, the set of observations and actions before time t is calculated as:
st=(x1,a1,...,at-1,xt)=xt (5)s t =(x 1 ,a 1 ,...,a t-1 ,x t )=x t (5)
式(5)中,xt和at分别为t时刻的观察量以及所采取的动作;In formula (5), x t and a t are the observed quantity and the action taken at time t, respectively;
步骤2.2,利用策略值函数来描述抓取过程的预期收益为:In step 2.2, use the policy value function to describe the expected income of the grasping process as:
Qπ(st,at)=E[Rt|st,at] (6)Q π (s t ,a t )=E[R t |s t ,a t ] (6)
式(6)中,为时刻t获得的打过折扣以后的未来收益总和,γ∈[0,1]为折扣因子,r(st,at)为时刻t的收益函数,T为抓取结束的时刻,π为抓取策略;In formula (6), is the sum of discounted future income obtained at time t, γ∈[0,1] is the discount factor, r(s t , a t ) is the income function at time t, T is the time when the capture ends, and π is crawl strategy;
由于抓取的目标策略π是预设确定的,记为函数μ:S←A,S为状态空间,A为N维度的动作空间,同时利用贝尔曼方程处理式(6)有:Since the grasping target strategy π is preset and determined, it is recorded as a function μ:S←A, S is the state space, and A is the N-dimensional action space. At the same time, the Bellman equation is used to process formula (6):
式(7)中,st+1~E表示t+1时刻的观察量是从环境E中获得的,μ(st+1)表示t+1时刻从观察量通过函数μ所映射到的动作;In formula (7), s t+1 ~E means that the observed amount at time t+1 is obtained from the environment E, and μ(s t+1 ) means that the observed amount at time t+1 is mapped to by the function μ action;
步骤2.3,利用最大似然估计的原则,通过最小化损失函数来更新网络权重参数为θQ的策略评估网络Q(s,a|θQ),所采用的损失函数为:In step 2.3, use the principle of maximum likelihood estimation to update the policy evaluation network Q(s,a|θ Q ) with network weight parameters θ Q by minimizing the loss function. The loss function used is:
L(θQ)=Eμ'[(Q(st,at|θQ)-yt)2] (8)L(θ Q )=E μ' [(Q(s t ,a t |θ Q )-y t ) 2 ] (8)
式(8)中,yt=r(st,at)+γQ(st+1,μ(st+1)|θQ)为目标策略评估网络,μ'为目标策略;In formula (8), y t =r(s t ,a t )+γQ(s t+1 ,μ(s t+1 )|θ Q ) is the target strategy evaluation network, and μ' is the target strategy;
步骤2.4,对于实际的参数为θμ的策略函数μ(s|θμ),利用链式法得到的梯度为:Step 2.4, for the policy function μ(s|θ μ ) whose actual parameter is θ μ , the gradient obtained by using the chain method is:
由式(9)计算得到的梯度即为策略梯度,再利用策略梯度来更新策略函数μ(s|θμ);The gradient calculated by formula (9) is the strategy gradient, and then use the strategy gradient to update the strategy function μ(s|θ μ );
步骤2.5,利用离策略算法来训练网络,网络训练中用到的样本数据从同一个样本缓冲区中得到,以最小化样本之间的关联性,同时用一个目标Q值网络来训练神经网络,即采用经验回放机制和目标Q值网络方法对于目标网络的更新,所采用的缓慢更新策略为:Step 2.5, using the off-strategy algorithm to train the network, the sample data used in network training is obtained from the same sample buffer to minimize the correlation between samples, and at the same time use a target Q value network to train the neural network, That is, using the experience playback mechanism and the target Q value network method to update the target network, the slow update strategy adopted is:
θQ'←τθQ+(1-τ)θQ' (10)θ Q' ←τθ Q +(1-τ)θ Q' (10)
θμ'←τθμ+(1-τ)θμ' (11)θ μ' ←τθ μ +(1-τ)θ μ' (11)
式(10)和(11)中,τ为更新率,τ<<1,由此便构建了一个基于DDPG的深度强化学习网络,且为收敛的神经网络;In formulas (10) and (11), τ is the update rate, τ<<1, thus constructing a deep reinforcement learning network based on DDPG, and it is a convergent neural network;
步骤2.6,利用构建好的深度强化学习网络对照片进行数据降维特征提取,获得机器人的控制策略。In step 2.6, use the built deep reinforcement learning network to perform data dimensionality reduction feature extraction on the photos to obtain the control strategy of the robot.
作为本发明的进一步限定方案,步骤2.6中的深度强化学习网络由一个图像输入层、两个卷积层、两个全连接层以及一个输出层构成,图像输入层用于输入包含待抓取物体的图像;卷积层用于提取特征,即一个图像的深层表现形式;全连接层和输出层用于构成一个深层网络,通过训练以后,输入特征信息到该深层网络即可输出控制指令,即控制机器人的机械手臂舵机角度和控制搭载小车的直流电机转速。将所选择的卷积层和全连接层的数量为两个的目的是既可以有效提取图像特征,又可以使得神经网络在训练时便于收敛。As a further limitation of the present invention, the deep reinforcement learning network in step 2.6 consists of an image input layer, two convolutional layers, two fully connected layers, and an output layer, and the image input layer is used to input images containing objects to be grasped image; the convolutional layer is used to extract features, that is, the deep representation of an image; the fully connected layer and the output layer are used to form a deep network. After training, input feature information to the deep network to output control instructions, namely Control the angle of the steering gear of the robot's mechanical arm and control the speed of the DC motor carrying the car. The purpose of setting the number of convolutional layers and fully connected layers to two is to effectively extract image features, and to facilitate the convergence of the neural network during training.
本发明的有益效果在于:(1)预训练神经网络时采用经验回放机制和随机采样确定输入的图像信息可以有效解决照片前后相关度较大不满足神经网络对于输入数据彼此独立要求的问题;(2)通过深度学习实现数据降维,采用目标Q值网络法来不断调整神经网络的权重矩阵,可以尽可能地保证训练的神经网络收敛;(3)已经训练好的基于DDPG的深度强化学习神经网络可以实现数据降维和物件特征提取,并直接给出机器人的运动控制策略,有效解决“维数灾难”问题。The beneficial effects of the present invention are: (1) when pre-training the neural network, adopting the experience playback mechanism and random sampling to determine the input image information can effectively solve the problem that the correlation between the front and back of the photo is relatively large and does not meet the independent requirements of the neural network for the input data; 2) Realize data dimension reduction through deep learning, and use the target Q value network method to continuously adjust the weight matrix of the neural network, which can ensure the convergence of the trained neural network as much as possible; (3) The trained deep reinforcement learning neural network based on DDPG The network can realize data dimension reduction and object feature extraction, and directly provide the robot's motion control strategy, effectively solving the "curse of dimensionality" problem.
附图说明Description of drawings
图1为本发明的系统结构示意图;Fig. 1 is a schematic diagram of the system structure of the present invention;
图2为本发明的方法流程图;Fig. 2 is method flowchart of the present invention;
图3为本发明的双目测距方法平面示意图;Fig. 3 is a schematic plan view of the binocular ranging method of the present invention;
图4为本发明的双目测距技术立体示意图;Fig. 4 is a three-dimensional schematic diagram of the binocular ranging technology of the present invention;
图5为本发明的基于DDPG的深度强化学习网络的构成示意图。FIG. 5 is a schematic diagram of the composition of the DDPG-based deep reinforcement learning network of the present invention.
具体实施方式detailed description
如图1所示,本发明的一种基于深度强化学习方法的机器人自适应抓取的系统包括:图像处理系统、无线通讯系统和机器人运动系统。As shown in FIG. 1 , a robot adaptive grasping system based on a deep reinforcement learning method of the present invention includes: an image processing system, a wireless communication system and a robot motion system.
其中,图像处理系统主要有安装在机器人前部的摄像头和matlab软件构成;无线通讯系统主要由WIFI模块构成;机器人运动系统主要由底座小车和机械手臂构成;首先需要借助动力学仿真平台预训练基于DDPG(深度确定性策略梯度)的深度强化学习网络,在此过程中通常采用经验回放机制和目标Q值网络这两种方法来确保基于DDPG的深度强化学习网络在预训练过程中能收敛,接着图像处理系统获取目标物体的图像,通过无线通讯系统将图像信息传给电脑,在机器人距离待抓取物体较远时,采用双目测距技术,以得到目标物体的位置信息并将其用于机器人的导航。Among them, the image processing system is mainly composed of a camera installed in the front of the robot and matlab software; the wireless communication system is mainly composed of a WIFI module; the robot motion system is mainly composed of a base car and a mechanical arm; The deep reinforcement learning network of DDPG (Deep Deterministic Policy Gradient), in which the experience playback mechanism and the target Q value network are usually used to ensure that the deep reinforcement learning network based on DDPG can converge during the pre-training process, and then The image processing system acquires the image of the target object, and transmits the image information to the computer through the wireless communication system. When the robot is far away from the object to be grasped, the binocular ranging technology is used to obtain the position information of the target object and use it for Robotic navigation.
当机器人移动至机械手臂可以抓到物体时,此时再拍摄物体照片并利用已经训练好的基于DDPG的深度强化学习网络实现数据降维提取特征并给出机器人的控制策略,最后将控制策略通过无线通讯系统传送给机器人运动系统来控制机器人的运动状态,实现目标物体的准确抓取。When the robot moves to the point where the robot arm can grasp the object, it will take a photo of the object and use the trained DDPG-based deep reinforcement learning network to realize data dimensionality reduction and extract features and give the robot's control strategy, and finally pass the control strategy through The wireless communication system transmits to the robot motion system to control the motion state of the robot to achieve accurate grasping of the target object.
预训练时首先利用matlab软件将目标物体的RGB图像转化为灰度图像,再采用经验回放机制,使得照片前后相关度尽可能小以满足神经网络对于输入数据彼此独立的要求,最后通过随机采样来获得输入神经网络的图像;通过深度学习实现数据降维,采用目标Q值网络法来不断调整神经网络的权重矩阵,最终得到收敛的神经网络。During pre-training, first use matlab software to convert the RGB image of the target object into a grayscale image, and then use the experience playback mechanism to make the correlation between the front and back of the photo as small as possible to meet the requirements of the neural network for the independence of input data, and finally through random sampling. Obtain the image of the input neural network; realize data dimensionality reduction through deep learning, and use the target Q value network method to continuously adjust the weight matrix of the neural network, and finally obtain a converged neural network.
机器人的控制用Arduino板实现,板上自带了WIFI模块,机械手臂由4个舵机构成,共实现4个自由度,底座小车由直流电机驱动;图像处理系统主要由摄像头及其图像传输软件和matlab为主;摄像头拍摄到的目标物体的照片将由Arduino板上的WIFI模块传输到电脑,并交由matlab处理。The control of the robot is realized by the Arduino board, which comes with a WIFI module. The robotic arm is composed of 4 steering gears, realizing 4 degrees of freedom in total. The base car is driven by a DC motor; the image processing system is mainly composed of a camera and its image transmission software. Mainly with matlab; the photos of the target object captured by the camera will be transmitted to the computer by the WIFI module on the Arduino board, and will be processed by matlab.
系统在工作时,步骤如下:When the system is working, the steps are as follows:
步骤1,首先需要借助动力学仿真平台预训练基于DDPG(深度确定性策略梯度)的深度强化学习网络,在此过程中通常采用经验回放机制和目标Q值网络这两种方法来确保基于DDPG的深度强化学习网络在预训练过程中能收敛;Step 1, firstly, it is necessary to pre-train a deep reinforcement learning network based on DDPG (Deep Deterministic Policy Gradient) with the help of a dynamics simulation platform. The deep reinforcement learning network can converge during the pre-training process;
步骤2,用安装在机器人前部的摄像头获取目标物体的图像,利用WIFI模块将图像信息传给电脑;Step 2, use the camera installed in the front of the robot to obtain the image of the target object, and use the WIFI module to transmit the image information to the computer;
步骤3,在机器人距离待抓取物体较远时,采用双目测距技术,以得到目标物体的位置信息并将其用于机器人的导航;Step 3, when the robot is far away from the object to be grasped, the binocular ranging technology is used to obtain the position information of the target object and use it for the navigation of the robot;
步骤4,当机器人移动至机械手臂可以抓到物体时,此时再拍摄物体照片并利用已经训练好的基于DDPG的深度强化学习网络实现数据降维提取特征并给出机器人的控制策略;Step 4, when the robot moves to the point where the robot arm can grasp the object, then take a photo of the object and use the trained DDPG-based deep reinforcement learning network to achieve data dimensionality reduction and feature extraction and give the robot's control strategy;
步骤5,利用WIFI模块将控制信息传送给机器人运动系统,实现目标物体的准确抓取;Step 5, use the WIFI module to transmit the control information to the robot motion system to achieve accurate grasping of the target object;
如图3和图4所示,双目测距技术主要利用了目标点在左右两幅视图上成像的横向坐标直接存在的差异(即视差)与目标点到成像平面的距离存在着反比例的关系。一般情况下,焦距的量纲是像素点,摄像头中心距的量纲由定标板棋盘格的实际尺寸和我们的输入值确定,一般是以毫米为单位(为了提高精度我们设置为0.1毫米量级),视差的量纲也是像素点。因此分子分母约去,目标点到成像平面的距离的量纲与摄像头中心距的相同。As shown in Figure 3 and Figure 4, the binocular distance measurement technology mainly uses the direct difference between the horizontal coordinates of the target point on the left and right views (that is, the parallax) and the distance from the target point to the imaging plane. There is an inverse proportional relationship. . In general, the dimension of the focal length is pixels, and the dimension of the camera center distance is determined by the actual size of the checkerboard on the calibration board and our input value, usually in millimeters (in order to improve the accuracy, we set it to 0.1 mm level), the dimension of parallax is also pixels. Therefore, the numerator and denominator are reduced, and the dimension of the distance from the target point to the imaging plane is the same as that of the camera center distance.
如图5所示,基于DDPG的深度强化学习网络主要由一个图像输入层、两个卷积层、两个全连接层、一个输出层构成。深度网络架构用于实现数据降维,卷积层用于提取特征,输出层输出控制信息。As shown in Figure 5, the DDPG-based deep reinforcement learning network is mainly composed of an image input layer, two convolutional layers, two fully connected layers, and an output layer. The deep network architecture is used to achieve data dimensionality reduction, the convolution layer is used to extract features, and the output layer outputs control information.
如图2所示,本发明提供了一种基于深度强化学习的机器人自适应抓取方法,包括如下步骤:As shown in Figure 2, the present invention provides a robot adaptive grasping method based on deep reinforcement learning, comprising the following steps:
步骤1,在距离待抓取目标一定距离时,机器人通过前部的摄像头获取目标的照片,再根据照片利用双目测距方法计算出目标的位置信息,并将计算出的位置信息用于机器人导航;Step 1. At a certain distance from the target to be grasped, the robot obtains a photo of the target through the front camera, and then calculates the position information of the target using the binocular ranging method based on the photo, and uses the calculated position information for the robot navigation;
步骤2,机器人根据导航进行移动,当目标进入机械手臂抓范围内时,再通过前部的摄像头拍摄目标的照片,并利用预先训练过的基于DDPG的深度强化学习网络对照片进行数据降维特征提取;Step 2. The robot moves according to the navigation. When the target enters the grasping range of the robotic arm, it takes a photo of the target through the front camera, and uses the pre-trained DDPG-based deep reinforcement learning network to perform data dimensionality reduction on the photo. extract;
步骤3,根据特征提取结果得出机器人的控制策略,机器人利用控制策略来控制运动路径和机械手臂的位姿,从而实现目标的自适应抓取。In step 3, the control strategy of the robot is obtained according to the feature extraction results, and the robot uses the control strategy to control the motion path and the pose of the mechanical arm, so as to realize the adaptive grasping of the target.
其中,步骤1中根据照片利用双目测距方法计算出目标的位置信息的具体步骤为:Wherein, in step 1, the specific steps for calculating the position information of the target by using the binocular ranging method according to the photos are:
步骤1.1,获取摄像头的焦距f、左右两个摄像头的中心距Tx以及目标点在左右两个摄像头的像平面的投影点到各自像平面最左侧的物理距离xl和xr,左右两个摄像头对应的左侧的像平面和右侧的像平面均为矩形平面,且位于同一成像平面上,左右两个摄像头的光心投影分别位于相应像平面的中心处,即Ol、Or在成像平面的投影点,则视差d为:Step 1.1, obtain the focal length f of the camera, the center distance T x of the left and right cameras, and the physical distance x l and x r from the projection point of the target point on the image plane of the left and right cameras to the leftmost side of the respective image plane, and the left and right two The image plane on the left and the image plane on the right corresponding to each camera are both rectangular planes, and they are located on the same imaging plane. The optical center projections of the left and right cameras are respectively located at the centers of the corresponding image planes, that is, O l , O r At the projection point of the imaging plane, the parallax d is:
d=xl-xr (1)d=x l -x r (1)
步骤1.2,利用三角形相似原理建立Q矩阵为:Step 1.2, using the triangular similarity principle to establish the Q matrix as:
式(2)和(3)中,(X,Y,Z)为目标点在以左摄像头光心为原点的立体坐标系中的坐标,W为旋转平移变换比例系数,(x,y)为目标点在左侧的像平面中的坐标,cx和cy分别为左侧的像平面和右侧的像平面的坐标系与立体坐标系中原点的偏移量,cx'为cx的修正值(两者数值一般相差不大,在本发明中可以认为两者近似相等);In formulas (2) and (3), (X, Y, Z) are the coordinates of the target point in the three-dimensional coordinate system with the optical center of the left camera as the origin, W is the rotation and translation transformation scale coefficient, and (x, y) is The coordinates of the target point in the image plane on the left, c x and cy are the offsets between the coordinate system of the image plane on the left and the image plane on the right and the origin in the stereo coordinate system, c x ' is c x The correction value of (both numerical values are generally not much different, in the present invention it can be considered that the two are approximately equal);
步骤1.3,计算得到目标点到成像平面的空间距离为:In step 1.3, the calculated spatial distance from the target point to the imaging plane is:
将左摄像头的光心所在位置作为机器人所在位置,将目标点的坐标位置信息(X,Y,Z)作为导航目的地进行机器人导航。The position of the optical center of the left camera is used as the position of the robot, and the coordinate position information (X, Y, Z) of the target point is used as the navigation destination for robot navigation.
步骤2中利用预先训练过的基于DDPG的深度强化学习网络对照片进行数据降维特征提取的具体步骤为:In step 2, the specific steps of using the pre-trained DDPG-based deep reinforcement learning network to extract data dimensionality reduction features from photos are as follows:
步骤2.1,利用目标抓取过程符合强化学习且满足马尔科夫性质的条件,计算t时刻之前的观察量和动作的集合为:Step 2.1, using the condition that the target grasping process conforms to reinforcement learning and satisfies the Markov property, the set of observations and actions before time t is calculated as:
st=(x1,a1,...,at-1,xt)=xt (5)s t =(x 1 ,a 1 ,...,a t-1 ,x t )=x t (5)
式(5)中,xt和at分别为t时刻的观察量以及所采取的动作;In formula (5), x t and a t are the observed quantity and the action taken at time t, respectively;
步骤2.2,利用策略值函数来描述抓取过程的预期收益为:In step 2.2, use the policy value function to describe the expected income of the grasping process as:
Qπ(st,at)=E[Rt|st,at] (6)Q π (s t ,a t )=E[R t |s t ,a t ] (6)
式(6)中,为时刻t获得的打过折扣以后的未来收益总和,γ∈[0,1]为折扣因子,r(st,at)为时刻t的收益函数,T为抓取结束的时刻,π为抓取策略;In formula (6), is the sum of discounted future income obtained at time t, γ∈[0,1] is the discount factor, r(s t , a t ) is the income function at time t, T is the time when the capture ends, and π is crawl strategy;
由于抓取的目标策略π是预设确定的,记为函数μ:S←A,S为状态空间,A为N维度的动作空间,同时利用贝尔曼方程处理式(6)有:Since the grasping target strategy π is preset and determined, it is recorded as a function μ:S←A, S is the state space, and A is the N-dimensional action space. At the same time, the Bellman equation is used to process formula (6):
式(7)中,st+1~E表示t+1时刻的观察量是从环境E中获得的,μ(st+1)表示t+1In formula (7), s t+1 ~E means that the observation at time t+1 is obtained from the environment E, and μ(s t+1 ) means that t+1
时刻从观察量通过函数μ所映射到的动作;Time is mapped to actions from observations through the function μ;
步骤2.3,利用最大似然估计的原则,通过最小化损失函数来更新网络权重参数为θQ的策略评估网络Q(s,a|θQ),所采用的损失函数为:In step 2.3, use the principle of maximum likelihood estimation to update the policy evaluation network Q(s,a|θ Q ) with network weight parameters θ Q by minimizing the loss function. The loss function used is:
L(θQ)=Eμ'[(Q(st,at|θQ)-yt)2] (8)L(θ Q )=E μ' [(Q(s t ,a t |θ Q )-y t ) 2 ] (8)
式(8)中,yt=r(st,at)+γQ(st+1,μ(st+1)|θQ)为目标策略评估网络,μ'为目标策略;In formula (8), y t =r(s t ,a t )+γQ(s t+1 ,μ(s t+1 )|θ Q ) is the target strategy evaluation network, and μ' is the target strategy;
步骤2.4,对于实际的参数为θμ的策略函数μ(s|θμ),利用链式法得到的梯度为:Step 2.4, for the policy function μ(s|θ μ ) whose actual parameter is θ μ , the gradient obtained by using the chain method is:
由式(9)计算得到的梯度即为策略梯度,再利用策略梯度来更新策略函数μ(s|θμ);The gradient calculated by formula (9) is the strategy gradient, and then use the strategy gradient to update the strategy function μ(s|θ μ );
步骤2.5,利用离策略算法来训练网络,网络训练中用到的样本数据从同一个样本缓冲区中得到,以最小化样本之间的关联性,同时用一个目标Q值网络来训练神经网络,即采用经验回放机制和目标Q值网络方法对于目标网络的更新,所采用的缓慢更新策略为:Step 2.5, using the off-strategy algorithm to train the network, the sample data used in network training is obtained from the same sample buffer to minimize the correlation between samples, and at the same time use a target Q value network to train the neural network, That is, using the experience playback mechanism and the target Q value network method to update the target network, the slow update strategy adopted is:
θQ'←τθQ+(1-τ)θQ' (10)θ Q' ←τθ Q +(1-τ)θ Q' (10)
θμ'←τθμ+(1-τ)θμ' (11)θ μ' ←τθ μ +(1-τ)θ μ' (11)
式(10)和(11)中,τ为更新率,τ<<1,由此便构建了一个基于DDPG的深度强化学习网络,且为收敛的神经网络;In formulas (10) and (11), τ is the update rate, τ<<1, thus constructing a deep reinforcement learning network based on DDPG, and it is a convergent neural network;
步骤2.6,利用构建好的深度强化学习网络对照片进行数据降维特征提取,获得机器人的控制策略;深度强化学习网络由一个图像输入层、两个卷积层、两个全连接层以及一个输出层构成,其中,所选择的卷积层和全连接层的数量为两个的目的是既可以有效提取图像特征,又可以使得神经网络在训练时便于收敛;图像输入层用于输入包含待抓取物体的图像;卷积层用于提取特征,即一个图像的深层表现形式,如一些线条、边、弧线等;全连接层和输出层用于构成一个深层网络,通过训练以后,输入特征信息到该网络可以输出控制指令,即控制机器人的机械手臂舵机角度和控制搭载小车的直流电机转速。Step 2.6, use the built deep reinforcement learning network to extract the data dimensionality reduction feature of the photo, and obtain the control strategy of the robot; the deep reinforcement learning network consists of an image input layer, two convolutional layers, two fully connected layers and an output Layer composition, where the number of selected convolutional layers and fully connected layers is two to effectively extract image features and facilitate the convergence of the neural network during training; the image input layer is used to input images containing Take the image of the object; the convolutional layer is used to extract features, that is, the deep representation of an image, such as some lines, edges, arcs, etc.; the fully connected layer and the output layer are used to form a deep network. After training, the input features Information to the network can output control commands, that is, to control the angle of the servo of the robot's mechanical arm and to control the speed of the DC motor carrying the car.
本发明预训练神经网络时采用经验回放机制和随机采样确定输入的图像信息可以有效解决照片前后相关度较大不满足神经网络对于输入数据彼此独立要求的问题;通过深度学习实现数据降维,采用目标Q值网络法来不断调整神经网络的权重矩阵,可以尽可能地保证训练的神经网络收敛;已经训练好的基于DDPG的深度强化学习神经网络可以实现数据降维和物件特征提取,并直接给出机器人的运动控制策略,有效解决“维数灾难”问题。The present invention adopts the experience playback mechanism and random sampling to determine the input image information when pre-training the neural network, which can effectively solve the problem that the large correlation between the front and back of the photos does not meet the independent requirements of the neural network for the input data; realize data dimensionality reduction through deep learning, and use The target Q value network method is used to continuously adjust the weight matrix of the neural network, which can ensure the convergence of the trained neural network as much as possible; the trained deep reinforcement learning neural network based on DDPG can realize data dimensionality reduction and object feature extraction, and directly give The motion control strategy of the robot effectively solves the "curse of dimensionality" problem.
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610402319.6A CN106094516A (en) | 2016-06-08 | 2016-06-08 | A kind of robot self-adapting grasping method based on deeply study |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610402319.6A CN106094516A (en) | 2016-06-08 | 2016-06-08 | A kind of robot self-adapting grasping method based on deeply study |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106094516A true CN106094516A (en) | 2016-11-09 |
Family
ID=57228280
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610402319.6A Pending CN106094516A (en) | 2016-06-08 | 2016-06-08 | A kind of robot self-adapting grasping method based on deeply study |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106094516A (en) |
Cited By (55)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106600650A (en) * | 2016-12-12 | 2017-04-26 | 杭州蓝芯科技有限公司 | Binocular visual sense depth information obtaining method based on deep learning |
CN106737673A (en) * | 2016-12-23 | 2017-05-31 | 浙江大学 | A kind of method of the control of mechanical arm end to end based on deep learning |
CN106780605A (en) * | 2016-12-20 | 2017-05-31 | 芜湖哈特机器人产业技术研究院有限公司 | A kind of detection method of the object crawl position based on deep learning robot |
CN106873585A (en) * | 2017-01-18 | 2017-06-20 | 无锡辰星机器人科技有限公司 | One kind navigation method for searching, robot and system |
CN106970594A (en) * | 2017-05-09 | 2017-07-21 | 京东方科技集团股份有限公司 | A kind of method for planning track of flexible mechanical arm |
CN107092254A (en) * | 2017-04-27 | 2017-08-25 | 北京航空航天大学 | A kind of design method for the Household floor-sweeping machine device people for strengthening study based on depth |
CN107139179A (en) * | 2017-05-26 | 2017-09-08 | 西安电子科技大学 | A kind of intellect service robot and method of work |
CN107168110A (en) * | 2016-12-09 | 2017-09-15 | 陈胜辉 | A kind of material grasping means and system |
CN107186708A (en) * | 2017-04-25 | 2017-09-22 | 江苏安格尔机器人有限公司 | Trick servo robot grasping system and method based on deep learning image Segmentation Technology |
CN107367929A (en) * | 2017-07-19 | 2017-11-21 | 北京上格云技术有限公司 | Update method, storage medium and the terminal device of Q value matrixs |
CN107450593A (en) * | 2017-08-30 | 2017-12-08 | 清华大学 | A kind of unmanned plane autonomous navigation method and system |
CN107450555A (en) * | 2017-08-30 | 2017-12-08 | 唐开强 | A kind of Hexapod Robot real-time gait planing method based on deeply study |
CN107479501A (en) * | 2017-09-28 | 2017-12-15 | 广州智能装备研究院有限公司 | 3D parts suction methods based on deep learning |
CN107479368A (en) * | 2017-06-30 | 2017-12-15 | 北京百度网讯科技有限公司 | A kind of method and system of the training unmanned aerial vehicle (UAV) control model based on artificial intelligence |
CN107562052A (en) * | 2017-08-30 | 2018-01-09 | 唐开强 | A kind of Hexapod Robot gait planning method based on deeply study |
CN107748566A (en) * | 2017-09-20 | 2018-03-02 | 清华大学 | A kind of underwater autonomous robot constant depth control method based on intensified learning |
CN108051999A (en) * | 2017-10-31 | 2018-05-18 | 中国科学技术大学 | Accelerator beam path control method and system based on deeply study |
CN108052004A (en) * | 2017-12-06 | 2018-05-18 | 湖北工业大学 | Industrial machinery arm autocontrol method based on depth enhancing study |
CN108305275A (en) * | 2017-08-25 | 2018-07-20 | 深圳市腾讯计算机系统有限公司 | Active tracking method, apparatus and system |
CN108321795A (en) * | 2018-01-19 | 2018-07-24 | 上海交通大学 | Start-stop of generator set configuration method based on depth deterministic policy algorithm and system |
CN108415254A (en) * | 2018-03-12 | 2018-08-17 | 苏州大学 | Waste recycling robot control method and device based on deep Q network |
CN108536011A (en) * | 2018-03-19 | 2018-09-14 | 中山大学 | A kind of Hexapod Robot complicated landform adaptive motion control method based on deeply study |
CN108594804A (en) * | 2018-03-12 | 2018-09-28 | 苏州大学 | Automatic driving control method for distribution trolley based on deep Q network |
CN108873687A (en) * | 2018-07-11 | 2018-11-23 | 哈尔滨工程大学 | A kind of Intelligent Underwater Robot behavior system knot planing method based on depth Q study |
CN109063827A (en) * | 2018-10-25 | 2018-12-21 | 电子科技大学 | It takes automatically in the confined space method, system, storage medium and the terminal of specific luggage |
CN109116854A (en) * | 2018-09-16 | 2019-01-01 | 南京大学 | A kind of robot cooperated control method of multiple groups based on intensified learning and control system |
CN109344877A (en) * | 2018-08-31 | 2019-02-15 | 深圳先进技术研究院 | A sample data processing method, sample data processing device and electronic equipment |
CN109358628A (en) * | 2018-11-06 | 2019-02-19 | 江苏木盟智能科技有限公司 | A kind of container alignment method and robot |
CN109407603A (en) * | 2017-08-16 | 2019-03-01 | 北京猎户星空科技有限公司 | A kind of method and device of control mechanical arm crawl object |
CN109483534A (en) * | 2018-11-08 | 2019-03-19 | 腾讯科技(深圳)有限公司 | A kind of grasping body methods, devices and systems |
CN109523029A (en) * | 2018-09-28 | 2019-03-26 | 清华大学深圳研究生院 | For the adaptive double from driving depth deterministic policy Gradient Reinforcement Learning method of training smart body |
CN109760046A (en) * | 2018-12-27 | 2019-05-17 | 西北工业大学 | Motion planning method for capturing rolling target of space robot based on reinforcement learning |
CN109807882A (en) * | 2017-11-20 | 2019-05-28 | 株式会社安川电机 | Holding system, learning device and holding method |
CN109909998A (en) * | 2017-12-12 | 2019-06-21 | 北京猎户星空科技有限公司 | A kind of method and device controlling manipulator motion |
WO2019155061A1 (en) * | 2018-02-09 | 2019-08-15 | Deepmind Technologies Limited | Distributional reinforcement learning using quantile function neural networks |
CN110202583A (en) * | 2019-07-09 | 2019-09-06 | 华南理工大学 | A kind of Apery manipulator control system and its control method based on deep learning |
CN110293549A (en) * | 2018-03-21 | 2019-10-01 | 北京猎户星空科技有限公司 | Mechanical arm control method, device and neural network model training method, device |
CN110323981A (en) * | 2019-05-14 | 2019-10-11 | 广东省智能制造研究所 | A kind of method and system controlling permanent magnetic linear synchronous motor |
CN110328668A (en) * | 2019-07-27 | 2019-10-15 | 南京理工大学 | Robotic arm path planing method based on rate smoothing deterministic policy gradient |
CN110400345A (en) * | 2019-07-24 | 2019-11-01 | 西南科技大学 | Push and Grab Collaborative Sorting Method for Radioactive Waste Based on Deep Reinforcement Learning |
CN110394804A (en) * | 2019-08-26 | 2019-11-01 | 山东大学 | A robot control method, controller and system based on layered thread framework |
CN110427021A (en) * | 2018-05-01 | 2019-11-08 | 本田技研工业株式会社 | System and method for generating automatic driving vehicle intersection navigation instruction |
CN110691676A (en) * | 2017-06-19 | 2020-01-14 | 谷歌有限责任公司 | Robot crawling prediction using neural networks and geometrically-aware object representations |
CN110722556A (en) * | 2019-10-17 | 2020-01-24 | 苏州恒辉科技有限公司 | Movable mechanical arm control system and method based on reinforcement learning |
CN111347411A (en) * | 2018-12-20 | 2020-06-30 | 中国科学院沈阳自动化研究所 | 3D visual recognition and grasping method of dual-arm collaborative robot based on deep learning |
WO2020134254A1 (en) * | 2018-12-27 | 2020-07-02 | 南京芊玥机器人科技有限公司 | Method employing reinforcement learning to optimize trajectory of spray painting robot |
CN111618847A (en) * | 2020-04-22 | 2020-09-04 | 南通大学 | Mechanical arm autonomous grabbing method based on deep reinforcement learning and dynamic motion elements |
CN112347900A (en) * | 2020-11-04 | 2021-02-09 | 中国海洋大学 | Monocular vision underwater target automatic grabbing method based on distance estimation |
US10926416B2 (en) | 2018-11-21 | 2021-02-23 | Ford Global Technologies, Llc | Robotic manipulation using an independently actuated vision system, an adversarial control scheme, and a multi-tasking deep learning architecture |
CN112734759A (en) * | 2021-03-30 | 2021-04-30 | 常州微亿智造科技有限公司 | Method and device for determining trigger point of flying shooting |
CN112757284A (en) * | 2019-10-21 | 2021-05-07 | 佳能株式会社 | Robot control apparatus, method and storage medium |
CN113836788A (en) * | 2021-08-24 | 2021-12-24 | 浙江大学 | Acceleration method for flow industry reinforcement learning control based on local data enhancement |
CN114454160A (en) * | 2021-12-31 | 2022-05-10 | 中国人民解放军国防科技大学 | Mechanical arm grabbing control method and system based on kernel least square soft Bellman residual reinforcement learning |
CN115890726A (en) * | 2022-11-10 | 2023-04-04 | 大连理工大学 | A Method for Generating Parallel Fixture Shapes Based on DDPG Reinforcement Learning Algorithm |
CN117516530A (en) * | 2023-09-28 | 2024-02-06 | 中国科学院自动化研究所 | Robot target navigation method and device |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080133053A1 (en) * | 2006-11-29 | 2008-06-05 | Honda Motor Co., Ltd. | Determination of Foot Placement for Humanoid Push Recovery |
CN102521205A (en) * | 2011-11-23 | 2012-06-27 | 河海大学常州校区 | Multi-Agent based robot combined search system by reinforcement learning |
CN102902271A (en) * | 2012-10-23 | 2013-01-30 | 上海大学 | Binocular vision-based robot target identifying and gripping system and method |
CN203390936U (en) * | 2013-04-26 | 2014-01-15 | 上海锡明光电科技有限公司 | Self-adaption automatic robotic system realizing dynamic and real-time capture function |
CN104778721A (en) * | 2015-05-08 | 2015-07-15 | 哈尔滨工业大学 | Distance measuring method of significant target in binocular image |
CN105115497A (en) * | 2015-09-17 | 2015-12-02 | 南京大学 | A reliable indoor mobile robot precise navigation and positioning system and method |
CN105137967A (en) * | 2015-07-16 | 2015-12-09 | 北京工业大学 | Mobile robot path planning method with combination of depth automatic encoder and Q-learning algorithm |
CN105425828A (en) * | 2015-11-11 | 2016-03-23 | 山东建筑大学 | Robot anti-impact double-arm coordination control system based on sensor fusion technology |
CN105459136A (en) * | 2015-12-29 | 2016-04-06 | 上海帆声图像科技有限公司 | Robot vision grasping method |
CN105637540A (en) * | 2013-10-08 | 2016-06-01 | 谷歌公司 | Methods and apparatus for reinforcement learning |
-
2016
- 2016-06-08 CN CN201610402319.6A patent/CN106094516A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080133053A1 (en) * | 2006-11-29 | 2008-06-05 | Honda Motor Co., Ltd. | Determination of Foot Placement for Humanoid Push Recovery |
CN102521205A (en) * | 2011-11-23 | 2012-06-27 | 河海大学常州校区 | Multi-Agent based robot combined search system by reinforcement learning |
CN102902271A (en) * | 2012-10-23 | 2013-01-30 | 上海大学 | Binocular vision-based robot target identifying and gripping system and method |
CN203390936U (en) * | 2013-04-26 | 2014-01-15 | 上海锡明光电科技有限公司 | Self-adaption automatic robotic system realizing dynamic and real-time capture function |
CN105637540A (en) * | 2013-10-08 | 2016-06-01 | 谷歌公司 | Methods and apparatus for reinforcement learning |
CN104778721A (en) * | 2015-05-08 | 2015-07-15 | 哈尔滨工业大学 | Distance measuring method of significant target in binocular image |
CN105137967A (en) * | 2015-07-16 | 2015-12-09 | 北京工业大学 | Mobile robot path planning method with combination of depth automatic encoder and Q-learning algorithm |
CN105115497A (en) * | 2015-09-17 | 2015-12-02 | 南京大学 | A reliable indoor mobile robot precise navigation and positioning system and method |
CN105425828A (en) * | 2015-11-11 | 2016-03-23 | 山东建筑大学 | Robot anti-impact double-arm coordination control system based on sensor fusion technology |
CN105459136A (en) * | 2015-12-29 | 2016-04-06 | 上海帆声图像科技有限公司 | Robot vision grasping method |
Non-Patent Citations (3)
Title |
---|
TIMOTHY P. LILLICRAP 等: "Continuous Control with Deep Reinforcement Learning", 《GOOGLE DEEPMIND,ICLR 2016》 * |
史忠植: "《心智计算》", 31 August 2015, 清华大学出版社 * |
陈强: "基于双目立体视觉的三维重建", 《图形图像》 * |
Cited By (91)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107168110A (en) * | 2016-12-09 | 2017-09-15 | 陈胜辉 | A kind of material grasping means and system |
CN106600650A (en) * | 2016-12-12 | 2017-04-26 | 杭州蓝芯科技有限公司 | Binocular visual sense depth information obtaining method based on deep learning |
CN106780605A (en) * | 2016-12-20 | 2017-05-31 | 芜湖哈特机器人产业技术研究院有限公司 | A kind of detection method of the object crawl position based on deep learning robot |
CN106737673A (en) * | 2016-12-23 | 2017-05-31 | 浙江大学 | A kind of method of the control of mechanical arm end to end based on deep learning |
CN106737673B (en) * | 2016-12-23 | 2019-06-18 | 浙江大学 | A method of the control of mechanical arm end to end based on deep learning |
CN106873585A (en) * | 2017-01-18 | 2017-06-20 | 无锡辰星机器人科技有限公司 | One kind navigation method for searching, robot and system |
CN107186708B (en) * | 2017-04-25 | 2020-05-12 | 珠海智卓投资管理有限公司 | Hand-eye servo robot grabbing system and method based on deep learning image segmentation technology |
CN107186708A (en) * | 2017-04-25 | 2017-09-22 | 江苏安格尔机器人有限公司 | Trick servo robot grasping system and method based on deep learning image Segmentation Technology |
CN107092254B (en) * | 2017-04-27 | 2019-11-29 | 北京航空航天大学 | A kind of design method of the Household floor-sweeping machine device people based on depth enhancing study |
CN107092254A (en) * | 2017-04-27 | 2017-08-25 | 北京航空航天大学 | A kind of design method for the Household floor-sweeping machine device people for strengthening study based on depth |
CN106970594A (en) * | 2017-05-09 | 2017-07-21 | 京东方科技集团股份有限公司 | A kind of method for planning track of flexible mechanical arm |
CN106970594B (en) * | 2017-05-09 | 2019-02-12 | 京东方科技集团股份有限公司 | A kind of method for planning track of flexible mechanical arm |
CN107139179B (en) * | 2017-05-26 | 2020-05-29 | 西安电子科技大学 | A kind of intelligent service robot and working method |
CN107139179A (en) * | 2017-05-26 | 2017-09-08 | 西安电子科技大学 | A kind of intellect service robot and method of work |
US11554483B2 (en) | 2017-06-19 | 2023-01-17 | Google Llc | Robotic grasping prediction using neural networks and geometry aware object representation |
CN110691676A (en) * | 2017-06-19 | 2020-01-14 | 谷歌有限责任公司 | Robot crawling prediction using neural networks and geometrically-aware object representations |
US11150655B2 (en) | 2017-06-30 | 2021-10-19 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and system for training unmanned aerial vehicle control model based on artificial intelligence |
CN107479368A (en) * | 2017-06-30 | 2017-12-15 | 北京百度网讯科技有限公司 | A kind of method and system of the training unmanned aerial vehicle (UAV) control model based on artificial intelligence |
CN107367929A (en) * | 2017-07-19 | 2017-11-21 | 北京上格云技术有限公司 | Update method, storage medium and the terminal device of Q value matrixs |
CN109407603B (en) * | 2017-08-16 | 2020-03-06 | 北京猎户星空科技有限公司 | Method and device for controlling mechanical arm to grab object |
CN109407603A (en) * | 2017-08-16 | 2019-03-01 | 北京猎户星空科技有限公司 | A kind of method and device of control mechanical arm crawl object |
CN108305275A (en) * | 2017-08-25 | 2018-07-20 | 深圳市腾讯计算机系统有限公司 | Active tracking method, apparatus and system |
CN107562052A (en) * | 2017-08-30 | 2018-01-09 | 唐开强 | A kind of Hexapod Robot gait planning method based on deeply study |
CN107450593B (en) * | 2017-08-30 | 2020-06-12 | 清华大学 | Unmanned aerial vehicle autonomous navigation method and system |
CN107450593A (en) * | 2017-08-30 | 2017-12-08 | 清华大学 | A kind of unmanned plane autonomous navigation method and system |
CN107450555A (en) * | 2017-08-30 | 2017-12-08 | 唐开强 | A kind of Hexapod Robot real-time gait planing method based on deeply study |
CN107748566B (en) * | 2017-09-20 | 2020-04-24 | 清华大学 | Underwater autonomous robot fixed depth control method based on reinforcement learning |
CN107748566A (en) * | 2017-09-20 | 2018-03-02 | 清华大学 | A kind of underwater autonomous robot constant depth control method based on intensified learning |
CN107479501A (en) * | 2017-09-28 | 2017-12-15 | 广州智能装备研究院有限公司 | 3D parts suction methods based on deep learning |
CN108051999A (en) * | 2017-10-31 | 2018-05-18 | 中国科学技术大学 | Accelerator beam path control method and system based on deeply study |
CN109807882B (en) * | 2017-11-20 | 2022-09-16 | 株式会社安川电机 | Gripping system, learning device, and gripping method |
US11338435B2 (en) | 2017-11-20 | 2022-05-24 | Kabushiki Kaisha Yaskawa Denki | Gripping system with machine learning |
CN109807882A (en) * | 2017-11-20 | 2019-05-28 | 株式会社安川电机 | Holding system, learning device and holding method |
CN108052004B (en) * | 2017-12-06 | 2020-11-10 | 湖北工业大学 | Automatic control method of industrial robotic arm based on deep reinforcement learning |
CN108052004A (en) * | 2017-12-06 | 2018-05-18 | 湖北工业大学 | Industrial machinery arm autocontrol method based on depth enhancing study |
CN109909998A (en) * | 2017-12-12 | 2019-06-21 | 北京猎户星空科技有限公司 | A kind of method and device controlling manipulator motion |
CN109909998B (en) * | 2017-12-12 | 2020-10-02 | 北京猎户星空科技有限公司 | Method and device for controlling movement of mechanical arm |
CN108321795A (en) * | 2018-01-19 | 2018-07-24 | 上海交通大学 | Start-stop of generator set configuration method based on depth deterministic policy algorithm and system |
CN108321795B (en) * | 2018-01-19 | 2021-01-22 | 上海交通大学 | Start-stop configuration method and system for generator set based on deep deterministic strategy algorithm |
US11887000B2 (en) | 2018-02-09 | 2024-01-30 | Deepmind Technologies Limited | Distributional reinforcement learning using quantile function neural networks |
EP3701432A1 (en) * | 2018-02-09 | 2020-09-02 | DeepMind Technologies Limited | Distributional reinforcement learning using quantile function neural networks |
WO2019155061A1 (en) * | 2018-02-09 | 2019-08-15 | Deepmind Technologies Limited | Distributional reinforcement learning using quantile function neural networks |
US11610118B2 (en) | 2018-02-09 | 2023-03-21 | Deepmind Technologies Limited | Distributional reinforcement learning using quantile function neural networks |
US12205032B2 (en) | 2018-02-09 | 2025-01-21 | Deepmind Technologies Limited | Distributional reinforcement learning using quantile function neural networks |
CN108415254B (en) * | 2018-03-12 | 2020-12-11 | 苏州大学 | Control method of waste recycling robot based on deep Q network |
CN108594804B (en) * | 2018-03-12 | 2021-06-18 | 苏州大学 | Automatic driving control method of delivery car based on deep Q network |
CN108594804A (en) * | 2018-03-12 | 2018-09-28 | 苏州大学 | Automatic driving control method for distribution trolley based on deep Q network |
CN108415254A (en) * | 2018-03-12 | 2018-08-17 | 苏州大学 | Waste recycling robot control method and device based on deep Q network |
CN108536011A (en) * | 2018-03-19 | 2018-09-14 | 中山大学 | A kind of Hexapod Robot complicated landform adaptive motion control method based on deeply study |
CN110293549A (en) * | 2018-03-21 | 2019-10-01 | 北京猎户星空科技有限公司 | Mechanical arm control method, device and neural network model training method, device |
CN110293549B (en) * | 2018-03-21 | 2021-06-22 | 北京猎户星空科技有限公司 | Mechanical arm control method and device and neural network model training method and device |
CN110427021A (en) * | 2018-05-01 | 2019-11-08 | 本田技研工业株式会社 | System and method for generating automatic driving vehicle intersection navigation instruction |
CN110427021B (en) * | 2018-05-01 | 2024-04-12 | 本田技研工业株式会社 | System and method for generating navigation instructions for an autonomous vehicle intersection |
CN108873687A (en) * | 2018-07-11 | 2018-11-23 | 哈尔滨工程大学 | A kind of Intelligent Underwater Robot behavior system knot planing method based on depth Q study |
CN109344877A (en) * | 2018-08-31 | 2019-02-15 | 深圳先进技术研究院 | A sample data processing method, sample data processing device and electronic equipment |
CN109344877B (en) * | 2018-08-31 | 2020-12-11 | 深圳先进技术研究院 | A sample data processing method, sample data processing device and electronic equipment |
CN109116854A (en) * | 2018-09-16 | 2019-01-01 | 南京大学 | A kind of robot cooperated control method of multiple groups based on intensified learning and control system |
CN109523029B (en) * | 2018-09-28 | 2020-11-03 | 清华大学深圳研究生院 | Self-adaptive double-self-driven depth certainty strategy gradient reinforcement learning method |
CN109523029A (en) * | 2018-09-28 | 2019-03-26 | 清华大学深圳研究生院 | For the adaptive double from driving depth deterministic policy Gradient Reinforcement Learning method of training smart body |
CN109063827B (en) * | 2018-10-25 | 2022-03-04 | 电子科技大学 | Method, system, storage medium and terminal for automatically taking specific luggage in limited space |
CN109063827A (en) * | 2018-10-25 | 2018-12-21 | 电子科技大学 | It takes automatically in the confined space method, system, storage medium and the terminal of specific luggage |
CN109358628A (en) * | 2018-11-06 | 2019-02-19 | 江苏木盟智能科技有限公司 | A kind of container alignment method and robot |
CN109483534A (en) * | 2018-11-08 | 2019-03-19 | 腾讯科技(深圳)有限公司 | A kind of grasping body methods, devices and systems |
US10926416B2 (en) | 2018-11-21 | 2021-02-23 | Ford Global Technologies, Llc | Robotic manipulation using an independently actuated vision system, an adversarial control scheme, and a multi-tasking deep learning architecture |
CN111347411A (en) * | 2018-12-20 | 2020-06-30 | 中国科学院沈阳自动化研究所 | 3D visual recognition and grasping method of dual-arm collaborative robot based on deep learning |
CN111347411B (en) * | 2018-12-20 | 2023-01-24 | 中国科学院沈阳自动化研究所 | Three-dimensional visual recognition and grasping method of dual-arm collaborative robot based on deep learning |
CN109760046A (en) * | 2018-12-27 | 2019-05-17 | 西北工业大学 | Motion planning method for capturing rolling target of space robot based on reinforcement learning |
WO2020134254A1 (en) * | 2018-12-27 | 2020-07-02 | 南京芊玥机器人科技有限公司 | Method employing reinforcement learning to optimize trajectory of spray painting robot |
CN110323981A (en) * | 2019-05-14 | 2019-10-11 | 广东省智能制造研究所 | A kind of method and system controlling permanent magnetic linear synchronous motor |
CN110202583A (en) * | 2019-07-09 | 2019-09-06 | 华南理工大学 | A kind of Apery manipulator control system and its control method based on deep learning |
CN110400345B (en) * | 2019-07-24 | 2021-06-15 | 西南科技大学 | A push-grab collaborative sorting method for radioactive waste based on deep reinforcement learning |
CN110400345A (en) * | 2019-07-24 | 2019-11-01 | 西南科技大学 | Push and Grab Collaborative Sorting Method for Radioactive Waste Based on Deep Reinforcement Learning |
CN110328668B (en) * | 2019-07-27 | 2022-03-22 | 南京理工大学 | Path Planning Method of Robot Arm Based on Velocity Smooth Deterministic Policy Gradient |
CN110328668A (en) * | 2019-07-27 | 2019-10-15 | 南京理工大学 | Robotic arm path planing method based on rate smoothing deterministic policy gradient |
CN110394804B (en) * | 2019-08-26 | 2022-08-12 | 山东大学 | A robot control method, controller and system based on layered thread framework |
CN110394804A (en) * | 2019-08-26 | 2019-11-01 | 山东大学 | A robot control method, controller and system based on layered thread framework |
CN110722556A (en) * | 2019-10-17 | 2020-01-24 | 苏州恒辉科技有限公司 | Movable mechanical arm control system and method based on reinforcement learning |
CN112757284B (en) * | 2019-10-21 | 2024-03-22 | 佳能株式会社 | Robot control device, method, and storage medium |
CN112757284A (en) * | 2019-10-21 | 2021-05-07 | 佳能株式会社 | Robot control apparatus, method and storage medium |
CN111618847B (en) * | 2020-04-22 | 2022-06-21 | 南通大学 | Autonomous grasping method of robotic arm based on deep reinforcement learning and dynamic motion primitives |
CN111618847A (en) * | 2020-04-22 | 2020-09-04 | 南通大学 | Mechanical arm autonomous grabbing method based on deep reinforcement learning and dynamic motion elements |
CN112347900B (en) * | 2020-11-04 | 2022-10-14 | 中国海洋大学 | An automatic grasping method of monocular vision underwater target based on distance estimation |
CN112347900A (en) * | 2020-11-04 | 2021-02-09 | 中国海洋大学 | Monocular vision underwater target automatic grabbing method based on distance estimation |
CN112734759A (en) * | 2021-03-30 | 2021-04-30 | 常州微亿智造科技有限公司 | Method and device for determining trigger point of flying shooting |
CN113836788A (en) * | 2021-08-24 | 2021-12-24 | 浙江大学 | Acceleration method for flow industry reinforcement learning control based on local data enhancement |
CN113836788B (en) * | 2021-08-24 | 2023-10-27 | 浙江大学 | Acceleration method for flow industrial reinforcement learning control based on local data enhancement |
CN114454160A (en) * | 2021-12-31 | 2022-05-10 | 中国人民解放军国防科技大学 | Mechanical arm grabbing control method and system based on kernel least square soft Bellman residual reinforcement learning |
CN114454160B (en) * | 2021-12-31 | 2024-04-16 | 中国人民解放军国防科技大学 | Mechanical arm grabbing control method and system based on kernel least square soft Belman residual error reinforcement learning |
CN115890726A (en) * | 2022-11-10 | 2023-04-04 | 大连理工大学 | A Method for Generating Parallel Fixture Shapes Based on DDPG Reinforcement Learning Algorithm |
CN115890726B (en) * | 2022-11-10 | 2025-05-20 | 大连理工大学 | Parallel clamp shape generation method based on DDPG reinforcement learning algorithm |
CN117516530A (en) * | 2023-09-28 | 2024-02-06 | 中国科学院自动化研究所 | Robot target navigation method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106094516A (en) | A kind of robot self-adapting grasping method based on deeply study | |
EP3880413B1 (en) | Method and system for trajectory optimization for vehicles with geometric constraints | |
US11745355B2 (en) | Control device, control method, and non-transitory computer-readable storage medium | |
EP3825903B1 (en) | Method, apparatus and storage medium for detecting small obstacles | |
CN113671994B (en) | Multi-unmanned aerial vehicle and multi-unmanned ship inspection control system based on reinforcement learning | |
CN105225269B (en) | Object modelling system based on motion | |
CN109108942B (en) | Mechanical arm motion control method and system based on visual real-time teaching and adaptive DMPS | |
CN104777839B (en) | Robot autonomous barrier-avoiding method based on BP neural network and range information | |
CN105014667B (en) | A Relative Pose Calibration Method of Camera and Robot Based on Pixel Space Optimization | |
CN103895042A (en) | Industrial robot workpiece positioning grabbing method and system based on visual guidance | |
CN114851201B (en) | A six-degree-of-freedom visual closed-loop grasping method for robotic arm based on TSDF 3D reconstruction | |
CN105425828A (en) | Robot anti-impact double-arm coordination control system based on sensor fusion technology | |
CN110744541A (en) | Vision-guided underwater mechanical arm control method | |
Taryudi et al. | Eye to hand calibration using ANFIS for stereo vision-based object manipulation system | |
CN114770461B (en) | Mobile robot based on monocular vision and automatic grabbing method thereof | |
CN103991077B (en) | A shared control method for robot hand controllers based on force fusion | |
CN103759716A (en) | Dynamic target position and attitude measurement method based on monocular vision at tail end of mechanical arm | |
Hsieh et al. | Robotic arm assistance system based on simple stereo matching and Q-learning optimization | |
US11769269B2 (en) | Fusing multiple depth sensing modalities | |
CN104476544A (en) | Self-adaptive dead zone inverse model generating device of visual servo mechanical arm system | |
CN112347900B (en) | An automatic grasping method of monocular vision underwater target based on distance estimation | |
Zhou et al. | Adaptive leader-follower formation control and obstacle avoidance via deep reinforcement learning | |
CN108151713A (en) | A kind of quick position and orientation estimation methods of monocular VO | |
CN115194774B (en) | A dual-arm grasping system control method based on multi-vision | |
CN102736626A (en) | Vision-based pose stabilization control method of moving trolley |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161109 |