[go: up one dir, main page]

CN117095310A - Method for acquiring visual servo model, visual servo method and device - Google Patents

Method for acquiring visual servo model, visual servo method and device Download PDF

Info

Publication number
CN117095310A
CN117095310A CN202210524056.1A CN202210524056A CN117095310A CN 117095310 A CN117095310 A CN 117095310A CN 202210524056 A CN202210524056 A CN 202210524056A CN 117095310 A CN117095310 A CN 117095310A
Authority
CN
China
Prior art keywords
speed
data set
dataset
camera
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210524056.1A
Other languages
Chinese (zh)
Inventor
汪常进
王民航
俞鸿翔
楚亚奎
贺亚农
王越
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202210524056.1A priority Critical patent/CN117095310A/en
Publication of CN117095310A publication Critical patent/CN117095310A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Remote Sensing (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Manipulator (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides a method for acquiring a visual servo model, a visual servo method and a visual servo device. The method for acquiring the visual servo model comprises the following steps: acquiring a first image feature dataset comprising a first desired image feature and a plurality of first initial image features; acquiring a first speed data set, wherein the speed of the first speed data set is determined according to the camera pose corresponding to the first expected image feature and the camera pose corresponding to the first initial image features, and the speed of the first speed data set corresponds to the first initial image features one by one; the neural network model is trained based on the first image feature dataset and the first velocity dataset to obtain a first visual servo model. Thus, the obtained visual servo model can improve the control performance of visual servo.

Description

Method for acquiring visual servo model, visual servo method and device
Technical Field
The embodiment of the application relates to the field of computer vision, in particular to a method for acquiring a visual servo model, a visual servo method and a visual servo device.
Background
The visual servo is a control mechanism for guiding the movement of the robot by taking visual information as feedback and obtaining geometric features from the visual information, and is widely applied to the fields of industrial assembly line production, aerospace and the like. Visual servoing can be divided into position-based visual servoing (IBVS) and image-based visual servoing (position-based visual serving, PBVS).
The IBVS algorithm calculates corresponding control through geometric features, and errors of visual features and given values can be eliminated rapidly in a two-dimensional space. The trajectory of the cartesian space generated by the IBVS controller is very complex, especially in some application scenarios, such as unmanned aerial vehicle guiding landing by IBVS, where the camera start pose differs greatly from the target pose, which results in more difficult prediction and danger of the curve in the 3D space, especially when the target object is at the edge of the camera field of view, the situation of image feature loss may occur, resulting in a servo failure.
Compared to IBVS, PBVS is a visual feature with the 3D relative pose of the object and camera, can walk a perfect straight line in cartesian space, and is globally asymptotically stable. In the PBVS process, the pose estimation of the visual features is required to depend on environment parameters such as internal parameters of a visual system and a 3D model of an object, the environment parameters are difficult to be accurately measured in the actual use process, and errors of the environment parameters, particularly errors of the 3D model of the object, can cause great reduction of the control accuracy of the PBVS.
How to improve the control performance of visual servoing is a problem to be solved.
Disclosure of Invention
The embodiment of the application provides a method for acquiring a visual servo model, a visual servo method and a visual servo device, which can improve the control performance of visual servo.
In a first aspect, a method for obtaining a visual servoing model is provided, comprising: acquiring a first image feature dataset comprising a first desired image feature and a plurality of first initial image features; acquiring a first speed data set, wherein the speed of the first speed data set is determined according to the camera pose corresponding to the first expected image feature and the camera pose corresponding to the first initial image features, and the speed of the first speed data set corresponds to the first initial image features one by one; the neural network model is trained based on the first image feature dataset and the first velocity dataset to obtain a first visual servo model.
The method for acquiring the visual servo model provided by the embodiment of the application trains the neural network model based on the image characteristics and the speed of the visual servo controller based on the position. In this way, the obtained visual servo model may be controlled as much as possible in a straight trajectory to control the robot movement to minimize the error between the initial image features and the desired image features. Thus, the efficiency and stability of visual servo are improved and the control performance of visual servo is improved while high servo precision is ensured.
With reference to the first aspect, in certain implementations of the first aspect, the first speed data set is a first camera speed data set or a first robot end speed data set; training the neural network model based on the first image feature dataset and the first velocity dataset to obtain a first visual servoing model, comprising: the neural network model is trained with the first image feature data set as input and the first speed data set as output to obtain a first visual servo model.
According to the method for acquiring the visual servo model, provided by the embodiment of the application, the speed of the first speed data set can be the speed of the camera or the speed of the tail end of the mechanical arm. In this way, the neural network model can be flexibly trained to obtain the first visual servo models with different outputs. In this way, if the output of the first visual servo model is the speed of the camera, the output can be converted into the speed of the tail end of the mechanical arm through the hand-eye conversion function, and the method can be flexibly applied to environments with different relative positions of the coordinate system of the tail end of the mechanical arm and the coordinate system of the camera. If the output of the first visual servo model is the speed of the tail end of the mechanical arm, the movement of the robot can be controlled without speed conversion, and the calculation cost of the visual servo model can be reduced.
With reference to the first aspect, in certain implementations of the first aspect, the first speed dataset is a first robot end speed dataset; training the neural network model based on the first image feature dataset and the first velocity dataset to obtain a first visual servoing model, comprising: inputting the first image feature dataset into a neural network model, outputting a first predicted camera speed dataset; converting the first predicted camera speed dataset into a first predicted robot end speed dataset according to a first initial hand-eye conversion function, the first initial hand-eye conversion function comprising a first initial camera profile; and adjusting the weight of the neural network model based on the first prediction mechanical arm tail end speed data set and the first speed data set to obtain a first visual servo model.
The method for acquiring the visual servo model provided by the embodiment of the application can be used for carrying out coordinate transformation on the output value of the neural network model, and comparing the transformed predicted value with the speed in the first mechanical arm tail end speed dataset so as to adjust the weight of the neural network model.
Optionally, the first initial camera profile may be adjusted based on the first predicted robot tip speed dataset and the first speed dataset to obtain a first updated camera profile.
According to the method for obtaining the visual servo model, the first initial camera external parameters can be adjusted based on the predicted value obtained after the output conversion of the neural network and the speed in the speed data set of the tail end of the first mechanical arm, so that more accurate camera external parameters can be obtained, and the accuracy of speed conversion can be improved.
With reference to the first aspect, in certain implementations of the first aspect, the method further includes: acquiring a second image feature dataset comprising a second desired image feature and a plurality of second initial image features; acquiring a second data set, wherein the speed in the second data set is determined according to the camera pose corresponding to the second expected image feature and the camera pose corresponding to the plurality of second initial image features, and the speed in the second data set corresponds to the plurality of second initial image features one by one; and updating the weight of the first visual servo model based on the second image characteristic data set and the second speed data set to obtain a second visual servo model.
According to the method for obtaining the visual servo model, when the first visual servo model is applied to the use environment with certain differences among the camera internal parameters, the camera external parameters and the 3D model of the object, the weight of the first visual servo model can be adjusted through the second image characteristic data set and the second speed data set, and the second visual servo model is obtained. The method can obtain the second visual servo model through a small amount of data sets, can obtain the efficiency of the visual servo model, and reduces the training cost.
With reference to the first aspect, in certain implementations of the first aspect, the output of the first visual servo model is a speed of the camera and the second speed dataset is a second robotic arm tip speed dataset; updating weights of the first visual servoing model based on the second image feature dataset and the second velocity dataset to obtain a second visual servoing model, comprising: inputting the second image feature dataset into the first visual servo model, outputting a second predicted camera speed dataset; converting the second predicted camera speed dataset into a second predicted robot end speed dataset according to a second initial hand-eye conversion function, the second initial hand-eye conversion function comprising a second initial camera profile; and updating the weight of the first visual servo model based on the second prediction mechanical arm tail end speed data set and the second speed data set to obtain a second visual servo model.
According to the method for obtaining the visual servo model, the output of the first visual servo model can be the camera speed, the camera speed is converted into the speed of the tail end of the mechanical arm, and then the converted tail end speed of the mechanical arm is compared with the speed in the data set of the tail end speed of the second mechanical arm, so that the weight of the first visual servo model is adjusted to eliminate the influence of errors of the 3D model of the camera internal parameters and the object, and the control performance of the visual servo is improved.
With reference to the first aspect, in certain implementations of the first aspect, the output of the first visual servo model is a speed of the robot arm tip and the second speed dataset is a second robot arm tip speed dataset; updating weights of the first visual servoing model based on the second image feature dataset and the second velocity dataset to obtain a second visual servoing model, comprising: inputting the second image characteristic data set into the first visual servo model, and outputting a second prediction mechanical arm tail end speed data set; and updating the weight of the first visual servo model based on the second prediction mechanical arm tail end speed data set and the second speed data set to obtain a second visual servo model.
According to the method for acquiring the visual servo model, provided by the embodiment of the application, the weight of the first visual servo model is adjusted based on the speed of the tail end of the mechanical arm of the output of the first visual servo model and the second speed data set so as to eliminate the influence caused by errors of the 3D model of the camera internal parameter, the camera external parameter and the object, and a second visual servo model with more accurate predicted value is obtained.
Optionally, the second initial camera profile may be updated based on the second predicted robot end speed data set and the second speed data set to obtain a second updated camera profile.
According to the method for acquiring the visual servo model, provided by the embodiment of the application, the initial camera external parameters can be adjusted based on the difference between the speed of the tail end of the converted mechanical arm and the speed of the first speed data set, so that the updated camera external parameters are obtained. Thus, the error of the initial camera external parameters can be eliminated, and the accuracy of speed conversion is improved.
In a second aspect, a visual servoing method is provided, comprising: acquiring target data, wherein the target data comprises initial image characteristics and expected image characteristics; inputting target data into a first visual servo model or a second visual servo model, outputting target speed, wherein the target speed is camera speed or mechanical arm tail end speed, the first visual servo model is obtained by training a neural network model based on a first image feature data set and a first speed data set, the first image feature data set comprises a first expected image feature and a plurality of first initial image features, the speed of the first speed data set is determined according to a camera pose corresponding to the first expected image feature and a camera pose corresponding to the plurality of first initial image features, the speed of the first speed data set is in one-to-one correspondence with the plurality of first initial image features, the second visual servo model is obtained by updating weights of the first visual servo model based on a second image feature data set and a second speed data set, the second image feature data set comprises a second expected image feature and a plurality of second initial image features, the speed in the second speed data set is determined according to a camera pose corresponding to the second expected image feature and a camera pose corresponding to the plurality of second initial image features, and the second speed in one-to-more initial image features; the robot movement is controlled based on the initial image features, the desired image features, and the target speed.
The visual servo method provided by the embodiment of the application can control the robot to move in a linear track as far as possible so as to minimize the error between the initial image characteristic and the expected image characteristic. Thus, the efficiency and stability of visual servo are improved and the control performance of visual servo is improved while high servo precision is ensured.
With reference to the second aspect, in certain implementations of the second aspect, the first speed data set is a first camera speed data set or a first robot end speed data set; the first visual servoing model is obtained by training the neural network model based on the first image feature dataset and the first speed dataset, comprising: the first visual servo model is obtained by training the neural network model by taking the first image characteristic data set as an input and taking the first speed data set as an output.
According to the visual servo method provided by the embodiment of the application, the output of the first visual servo model can be the speed of the camera, and the output can be converted into the speed of the tail end of the mechanical arm through the hand-eye conversion function, so that the visual servo method can be flexibly applied to environments with different relative positions of the coordinate system of the tail end of the mechanical arm and the coordinate system of the camera. The output of the first visual servo model can also be the speed of the tail end of the mechanical arm, the movement of the robot can be controlled without speed conversion, and the calculation cost in the visual servo process can be reduced.
With reference to the second aspect, in certain implementations of the second aspect, the first speed dataset is a first robot arm tip speed dataset; the first visual servoing model is obtained by training the neural network model based on the first image feature dataset and the first speed dataset, comprising: the first visual servo model is used for inputting a first image characteristic data set into the neural network model and outputting a first predicted camera speed data set; converting the first predicted camera speed dataset into a first predicted robot end speed dataset according to a first initial hand-eye conversion function, the first initial hand-eye conversion function comprising a first initial camera profile; and adjusting the weight of the neural network model based on the first predicted mechanical arm end speed data set and the first speed data set.
According to the visual servo method provided by the embodiment of the application, the first visual servo model is obtained by carrying out coordinate transformation on the output value of the neural network model, and comparing the transformed predicted value with the speed in the first mechanical arm tail end speed data set, so that the control performance of the visual servo can be improved by adjusting the weight of the neural network model.
With reference to the second aspect, in certain implementations of the second aspect, if the target speed is a target camera speed, the method further includes: acquiring a first updated hand-eye transfer function, wherein the first updated hand-eye transfer function comprises a first updated camera external parameter, and the first updated camera external parameter is obtained after updating a first initial camera external parameter based on a first prediction mechanical arm tail end speed data set and a first speed data set; controlling robot movement based on the initial image feature, the desired image feature, and the target speed, comprising: converting the target camera speed into a target mechanical arm tail end speed according to a first updated hand-eye conversion function; and controlling the robot to move according to the initial image characteristics, the expected image characteristics and the tail end speed of the target mechanical arm.
According to the visual servo method provided by the embodiment of the application, the first initial camera external parameter can be adjusted based on the predicted value after the output conversion of the neural network and the speed in the speed data set of the tail end of the first mechanical arm, so that more accurate camera external parameters are obtained, and the accuracy of speed conversion can be improved.
With reference to the second aspect, in certain implementations of the second aspect, the output of the first visual servo model is a speed of the camera and the second speed dataset is a second robot end speed dataset; the second visual servoing model is obtained based on the second image feature dataset and the second speed dataset updating weights of the first visual servoing model, comprising: the second visual servo model inputs the second image characteristic data set into the first visual servo model and outputs a second predicted camera speed data set; converting the second predicted camera speed dataset into a second predicted robot end speed dataset according to a second initial hand-eye conversion function, the second initial hand-eye conversion function comprising a second initial camera profile; updating the weights of the first visual servo model based on the second predicted robot tip speed dataset and the second speed dataset.
According to the visual servo method provided by the embodiment of the application, when the first visual servo model is applied to the use environment with certain difference between the camera internal parameter, the camera external parameter and the 3D model of the object, the weight of the first visual servo model can be adjusted through the second image characteristic data set and the second speed data set, so that the second visual servo model is obtained. In this way, when the second visual servo model is used for visual servo, the influence of errors such as the camera internal parameters and the 3D model of the object can be eliminated, and the accuracy of visual servo can be improved.
With reference to the second aspect, in certain implementations of the second aspect, if the target speed is a target camera speed, the method further includes: acquiring a second updated hand-eye transfer function, wherein the second updated hand-eye transfer function comprises a second updated camera external parameter, and the second updated camera external parameter is obtained after updating a second initial camera external parameter based on a second predicted mechanical arm tail end speed data set and a second speed data set; controlling robot movement based on the initial image feature, the desired image feature, and the target speed, comprising: converting the target camera speed into a target mechanical arm tail end speed according to a second updated hand-eye conversion function; and controlling the robot to move according to the initial image characteristics, the expected image characteristics and the tail end speed of the target mechanical arm.
According to the visual servo method provided by the embodiment of the application, when the second visual servo model is utilized for visual servo, the influence of errors such as camera internal parameters, 3D models of objects and the like can be eliminated, and the control performance of the visual servo is improved.
With reference to the second aspect, in certain implementations of the second aspect, the output of the first visual servo model is a speed of the robot arm tip and the second speed dataset is a second robot arm tip speed dataset; the second visual servoing model is obtained based on the second image feature dataset and the second speed dataset updating weights of the first visual servoing model, comprising: the second visual servo model is used for inputting a second image characteristic data set into the first visual servo model and outputting a second prediction mechanical arm tail end speed data set; and updating the weight of the first visual servo model according to the second prediction mechanical arm tail end speed data set and the second speed data set.
According to the visual servo method provided by the embodiment of the application, the weight of the first visual servo model can be adjusted through the speed of the tail end of the mechanical arm of the output of the first visual servo model and the second data set so as to eliminate the influence caused by errors of the camera internal parameter, the camera external parameter and the 3D model of the object, so that the prediction of the second visual servo model is more accurate, and the control performance of the visual servo is improved.
In a third aspect, an apparatus for obtaining a visual servoing model is provided, including: an acquisition unit configured to acquire a first image feature data set including a first desired image feature and a plurality of first initial image features; the acquisition unit is used for acquiring a first speed data set, wherein the speed of the first speed data set is determined according to the camera pose corresponding to the first expected image feature and the camera pose corresponding to the plurality of first initial image features, and the speed of the first speed data set corresponds to the plurality of first initial image features one by one; and the processing unit is used for training the neural network model based on the first image characteristic data set and the first speed data set to obtain a first visual servo model.
The device for acquiring the visual servo model provided by the embodiment of the application can be used for acquiring the visual servo model, wherein the speed of the first speed data set can be the speed of a camera or the speed of the tail end of the mechanical arm. In this way, the neural network model can be flexibly trained to obtain the first visual servo models with different outputs. In this way, if the output of the first visual servo model is the speed of the camera, the output can be converted into the speed of the tail end of the mechanical arm through the hand-eye conversion function, and the method can be flexibly applied to environments with different relative positions of the coordinate system of the tail end of the mechanical arm and the coordinate system of the camera. If the output of the first visual servo model is the speed of the tail end of the mechanical arm, the movement of the robot can be controlled without speed conversion, and the calculation cost of the visual servo model can be reduced.
With reference to the third aspect, in certain implementations of the third aspect, the first speed data set is a first camera speed data set or a first robot end speed data set; the processing unit is used for: the neural network model is trained with the first image feature data set as input and the first speed data set as output to obtain a first visual servo model.
The device for acquiring the visual servo model provided by the embodiment of the application can perform coordinate transformation on the output value of the neural network model, and compare the transformed predicted value with the speed in the first mechanical arm tail end speed data set, so that the weight of the neural network model is adjusted, and the control performance of the visual servo is improved.
Optionally, the first initial camera profile may be adjusted based on the first predicted robot tip speed dataset and the first speed dataset to obtain a first updated camera profile.
The device for acquiring the visual servo model provided by the embodiment of the application can also adjust the first initial camera external parameter based on the predicted value after the output conversion of the neural network and the speed in the speed data set of the tail end of the first mechanical arm, so that more accurate camera external parameters are obtained, and the accuracy of speed conversion can be improved.
With reference to the third aspect, in certain implementations of the third aspect, the first speed dataset is a first robot end speed dataset; the processing unit is used for: inputting the first image feature dataset into a neural network model, outputting a first predicted camera speed dataset; converting the first predicted camera speed dataset into a first predicted robot end speed dataset according to a first initial hand-eye conversion function, the first initial hand-eye conversion function comprising a first initial camera profile; and adjusting the weight of the neural network model based on the first prediction mechanical arm tail end speed data set and the first speed data set to obtain a first visual servo model.
According to the device for obtaining the visual servo model, when the first visual servo model is applied to the use environment with certain differences among the camera internal parameters, the camera external parameters and the 3D model of the object, the weight of the first visual servo model can be adjusted through the second image characteristic data set and the second speed data set, and the second visual servo model is obtained. The method can obtain the second visual servo model through a small amount of data sets, can obtain the efficiency of the visual servo model, and reduces the training cost.
With reference to the third aspect, in certain implementations of the third aspect, the acquiring unit is configured to acquire a second image feature data set, where the second image feature data set includes a second desired image feature and a plurality of second initial image features; the acquisition unit is used for acquiring a second data set, the speed in the second data set is determined according to the camera pose corresponding to the second expected image feature and the camera pose corresponding to the plurality of second initial image features, and the speed in the second data set corresponds to the plurality of second initial image features one by one; and the processing unit is used for updating the weight of the first visual servo model based on the second image characteristic data set and the second speed data set to obtain a second visual servo model.
According to the device for acquiring the visual servo model, provided by the embodiment of the application, the output of the first visual servo model can be the camera speed, the camera speed is converted into the speed of the tail end of the mechanical arm, and then the converted tail end speed of the mechanical arm is compared with the speed in the data set of the tail end speed of the second mechanical arm, so that the weight of the first visual servo model is adjusted to eliminate the influence of errors of the 3D model of the camera internal parameters and the object, and the control performance of the visual servo is improved.
With reference to the third aspect, in some implementations of the third aspect, the output of the first visual servo model is a speed of the camera and the second speed dataset is a second robotic arm tip speed dataset; the processing unit is specifically further configured to: inputting the second image feature dataset into the first visual servo model, outputting a second predicted camera speed dataset; converting the second predicted camera speed dataset into a second predicted robot end speed dataset according to a second initial hand-eye conversion function, the second initial hand-eye conversion function comprising a second initial camera profile; and updating the weight of the first visual servo model based on the second prediction mechanical arm tail end speed data set and the second speed data set to obtain a second visual servo model.
The device for acquiring the visual servo model provided by the embodiment of the application can train the neural network model based on the image characteristics and the speed of the visual servo controller based on the position. In this way, the obtained visual servo model may be controlled as much as possible in a straight trajectory to control the robot movement to minimize the error between the initial image features and the desired image features. Thus, the efficiency and stability of visual servo are improved and the control performance of visual servo is improved while high servo precision is ensured.
With reference to the third aspect, in some implementations of the third aspect, the output of the first visual servo model is a speed of the robot arm tip and the second speed dataset is a second robot arm tip speed dataset; the processing unit is specifically further configured to: inputting the second image characteristic data set into the first visual servo model, and outputting a second prediction mechanical arm tail end speed data set; and updating the weight of the first visual servo model based on the second prediction mechanical arm tail end speed data set and the second speed data set to obtain a second visual servo model.
According to the device for acquiring the visual servo model, provided by the embodiment of the application, the weight of the first visual servo model is adjusted based on the speed of the tail end of the mechanical arm of the output of the first visual servo model and the second speed data set so as to eliminate the influence caused by errors of the camera internal parameter, the camera external parameter and the 3D model of the object, and a second visual servo model with more accurate prediction is acquired.
Optionally, the first updated camera profile is obtained after adjusting the first initial camera profile based on the first predicted robot tip speed data set and the first speed data set.
The device for acquiring the visual servo model provided by the embodiment of the application can also adjust the first initial camera external parameters based on the predicted value after the output conversion of the neural network and the speed in the first mechanical arm tail end speed data set, so that more accurate camera external parameters are obtained, and the accuracy of speed conversion can be improved.
In a fourth aspect, there is provided a visual servoing device comprising: the acquisition unit is used for acquiring target data, wherein the target data comprises initial image characteristics and expected image characteristics; the processing unit is used for inputting target data into a first visual servo model or a second visual servo model, outputting target speed, wherein the target speed is camera speed or mechanical arm tail end speed, the first visual servo model is obtained by training a neural network model based on a first image feature data set and a first speed data set, the first image feature data set comprises a first expected image feature and a plurality of first initial image features, the speed of the first speed data set is determined according to a camera pose corresponding to the first expected image feature and a camera position corresponding to the plurality of first initial image features, the speed of the first speed data set is one-to-one corresponding to the plurality of first initial image features, the second visual servo model is obtained by updating the weight of the first visual servo model based on a second image feature data set and a second speed data set, the second image feature data set comprises a second expected image feature and a plurality of second initial image features, the speed in the second speed data set is determined according to a camera pose corresponding to the second expected image feature and a plurality of first initial image features, and the speed in the second speed data set is one-to-second initial image features; and the processing unit is used for controlling the robot to move according to the initial image characteristics, the expected image characteristics and the target speed.
The visual servo device provided by the embodiment of the application can control the robot to move in a linear track as far as possible so as to minimize the error between the initial image characteristic and the expected image characteristic. Thus, the efficiency and stability of visual servo are improved and the control performance of visual servo is improved while high servo precision is ensured.
With reference to the fourth aspect, in certain implementations of the fourth aspect, the first speed data set is a first camera speed data set or a first robot end speed data set; the first visual servoing model is obtained by training the neural network model based on the first image feature dataset and the first speed dataset, comprising: the first visual servo model is obtained by training the neural network model by taking the first image characteristic data set as an input and taking the first speed data set as an output.
According to the visual servo device provided by the embodiment of the application, the output of the first visual servo model can be the speed of the camera, and the output can be converted into the speed of the tail end of the mechanical arm through the hand-eye conversion function, so that the visual servo device can be flexibly applied to environments with different relative positions of the coordinate system of the tail end of the mechanical arm and the coordinate system of the camera. The output of the first visual servo model can also be the speed of the tail end of the mechanical arm, the movement of the robot can be controlled without speed conversion, and the calculation cost in the visual servo process can be reduced.
With reference to the fourth aspect, in certain implementations of the fourth aspect, the first speed dataset is a first robot end speed dataset; the first visual servoing model is obtained by training the neural network model based on the first image feature dataset and the first speed dataset, comprising: the first visual servo model is used for inputting a first image characteristic data set into the neural network model and outputting a first predicted camera speed data set; converting the first predicted camera speed dataset into a first predicted robot end speed dataset according to a first initial hand-eye conversion function, the first initial hand-eye conversion function comprising a first initial camera profile; and adjusting the weight of the neural network model based on the first predicted mechanical arm end speed data set and the first speed data set.
According to the visual servo device provided by the embodiment of the application, the first visual servo model is obtained by carrying out coordinate transformation on the output value of the neural network model, and comparing the transformed predicted value with the speed in the first mechanical arm tail end speed data set, so that the control performance of the visual servo can be improved by adjusting the weight of the neural network model.
With reference to the fourth aspect, in some implementations of the fourth aspect, if the target speed is a target camera speed; the acquisition unit is further configured to: acquiring a first updated hand-eye transfer function, wherein the first updated hand-eye transfer function comprises a first updated camera external parameter, and the first updated camera external parameter is obtained after updating a first initial camera external parameter based on a first prediction mechanical arm tail end speed data set and a first speed data set; the processing unit is used for: converting the target camera speed into a target mechanical arm tail end speed according to a first updated hand-eye conversion function; and controlling the robot to move according to the initial image characteristics, the expected image characteristics and the tail end speed of the target mechanical arm.
According to the visual servo device provided by the embodiment of the application, the first initial camera external parameter can be adjusted based on the predicted value after the output conversion of the neural network and the speed in the speed data set of the tail end of the first mechanical arm, so that more accurate camera external parameters are obtained, and the accuracy of speed conversion can be improved.
With reference to the fourth aspect, in certain implementations of the fourth aspect, the output of the first visual servo model is a speed of the camera and the second speed dataset is a second robot end speed dataset; the second visual servoing model is obtained based on the second image feature dataset and the second speed dataset updating weights of the first visual servoing model, comprising: the second visual servo model inputs the second image characteristic data set into the first visual servo model and outputs a second predicted camera speed data set; converting the second predicted camera speed dataset into a second predicted robot end speed dataset according to a second initial hand-eye conversion function, the second initial hand-eye conversion function comprising a second initial camera profile; updating the weights of the first visual servo model based on the second predicted robot tip speed dataset and the second speed dataset.
According to the visual servo device provided by the embodiment of the application, when the first visual servo model is applied to the use environment with certain difference between the camera internal parameter, the camera external parameter and the 3D model of the object, the weight of the first visual servo model can be adjusted through the second image characteristic data set and the second speed data set, so that the second visual servo model is obtained. In this way, when the second visual servo model is used for visual servo, the influence of errors such as the camera internal parameters and the 3D model of the object can be eliminated, and the accuracy of visual servo can be improved.
With reference to the fourth aspect, in some implementations of the fourth aspect, if the target speed is a target camera speed; the acquisition unit is further configured to: acquiring a second updated hand-eye transfer function, wherein the second updated hand-eye transfer function comprises a second updated camera external parameter, and the second updated camera external parameter is obtained after updating a second initial camera external parameter based on a second predicted mechanical arm tail end speed data set and a second speed data set; the processing unit is used for: converting the target camera speed into a target mechanical arm tail end speed according to a second updated hand-eye conversion function; and controlling the robot to move according to the initial image characteristics, the expected image characteristics and the tail end speed of the target mechanical arm.
According to the visual servo device provided by the embodiment of the application, when the second visual servo model is utilized for visual servo, the influence of errors such as camera internal parameters, 3D models of objects and the like can be eliminated, and the control performance of the visual servo is improved.
With reference to the fourth aspect, in certain implementations of the fourth aspect, the output of the first visual servo model is a speed of the robot arm tip and the second speed dataset is a second robot arm tip speed dataset; the second visual servoing model is obtained based on the second image feature dataset and the second speed dataset updating weights of the first visual servoing model, comprising: the second visual servo model inputs a second image characteristic data set into the first visual servo model and outputs a second prediction mechanical arm tail end speed data set; updating the weights of the first visual servo model based on the second predicted robot tip speed dataset and the second speed dataset.
According to the visual servo device provided by the embodiment of the application, the weight of the first visual servo model can be adjusted through the speed of the tail end of the mechanical arm of the output of the first visual servo model and the second speed data set so as to eliminate the influence caused by errors of the camera internal parameter, the camera external parameter and the 3D model of the object, so that the prediction of the second visual servo model is more accurate, and the control performance of visual servo is improved.
In a fifth aspect, there is provided an apparatus for acquiring a visual servoing model, the apparatus comprising: a memory for storing a program; a processor for executing a memory-stored program, the processor for performing the method of the first aspect and any one of the implementations of the first aspect when the memory-stored program is executed.
The processor in the fifth aspect may be a central processing unit (central processing unit, CPU) or a combination of a CPU and a neural network model operation processor, where the neural network model operation processor may include a graphics processor (graphics processing unit, GPU), a neural network model processor (neural-network processing unit, NPU), a tensor processor (tensor processing unit, TPU), and the like. Wherein the TPU is an artificial intelligence accelerator application specific integrated circuit fully customized for machine learning by google (google).
In a sixth aspect, there is provided a visual servoing device comprising: a memory for storing a program; a processor for executing the program stored in the memory, the processor being configured to perform the method of the second aspect and any implementation manner of the second aspect when the program stored in the memory is executed.
The processor in the sixth aspect may be a central processing unit or a combination of a CPU and a neural network model operation processor, where the neural network model operation processor may include a graphics processor, a neural network model processor, a tensor processor, and the like.
In a seventh aspect, a computer readable medium is provided, the computer readable medium storing program code for execution by a device, the program code comprising instructions for performing the method in either of the implementations of the first or second aspects.
In an eighth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any one of the implementations of the first or second aspects described above.
In a ninth aspect, a chip is provided, the chip including a processor and a data interface, the processor reading instructions stored on a memory through the data interface, and executing the method in any implementation manner of the first aspect or the second aspect.
Optionally, as an implementation manner, the chip may further include a memory, where the memory stores instructions, and the processor is configured to execute the instructions stored on the memory, where the instructions, when executed, are configured to perform the method in any implementation manner of the first aspect or the second aspect.
The chip may be a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).
Drawings
Fig. 1 is a schematic diagram of an SCN network according to an embodiment of the present application.
Fig. 2 is a system architecture diagram for model training according to an embodiment of the present application.
Fig. 3 is a schematic structural diagram of a system for obtaining a visual servo model according to an embodiment of the present application.
Fig. 4 is a schematic flow chart of obtaining a visual servo model according to an embodiment of the present application.
Fig. 5 is a schematic flow chart of obtaining a visual servo model according to an embodiment of the present application.
FIG. 6 is a flow chart of a visual servoing method according to an embodiment of the application
FIG. 7 is a schematic diagram of a visual servo model and servo tracks of an IBVS controller according to an embodiment of the present application.
Fig. 8 is a schematic diagram of control performance of the IBVS controller and the visual servo model according to an embodiment of the present application.
Fig. 9 is a schematic diagram of control performance of a conventional PBVS controller and a visual servo model according to an embodiment of the present application.
Fig. 10 is a schematic diagram of an apparatus for obtaining a visual servoing model according to an embodiment of the present application.
Fig. 11 is a schematic diagram of a visual servoing device according to an embodiment of the application.
FIG. 12 is a schematic block diagram of another apparatus for obtaining a visual servoing model according to an embodiment of the present application.
Fig. 13 is a schematic block diagram of another visual servoing device according to an embodiment of the application.
Detailed Description
Since embodiments of the present application relate to neural network applications, for ease of understanding, the following description will first discuss terms and concepts related to neural networks to which embodiments of the present application may relate.
(1) Neural network
The neural network may be composed of neural units, which may be referred to as x s And an arithmetic unit whose intercept 1 is an input, the output of the arithmetic unit may be:
wherein s=1, 2, … … n, n is a natural number greater than 1, W s Is x s B is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit to an output signal. The output signal of the activation function may be used as an input to a next convolutional layer, and the activation function may be a sigmoid function. The neural network is a neural network which is formed by combining a plurality of single nerves A network formed by cells coupled together, i.e. the output of one neural cell may be the input of another neural cell. The input of each neural unit may be connected to a local receptive field of a previous layer to extract features of the local receptive field, which may be an area composed of several neural units.
(2) Deep neural network
Deep neural networks (deep neural network, DNN), also known as multi-layer neural networks, can be understood as neural networks with multiple hidden layers. The DNNs are divided according to the positions of different layers, and the neural networks inside the DNNs can be divided into three types: input layer, hidden layer, output layer. Typically the first layer is the input layer, the last layer is the output layer, and the intermediate layers are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
Although DNN appears to be complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression:wherein (1)>Is an input vector, +.>Is the output vector, +.>Is the offset vector, W is the weight matrix (also called coefficient), and α () is the activation function. Each layer is only for the input vector +. >The output vector is obtained by such simple operation>Due to DNN number of layers is large, coefficient W and offset vector +.>And the number of (2) is also relatively large. The definition of these parameters in DNN is as follows: taking the coefficient W as an example: it is assumed that in DNN of one three layers, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as +.>The superscript 3 represents the number of layers in which the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
In summary, the coefficients of the kth neuron of the L-1 layer to the jth neuron of the L layer are defined as
It should be noted that the input layer is devoid of W parameters. In deep neural networks, more hidden layers make the network more capable of characterizing complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the greater the "capacity", meaning that it can accomplish more complex learning tasks. The process of training the deep neural network, i.e. learning the weight matrix, has the final objective of obtaining a weight matrix (a weight matrix formed by a number of layers of vectors W) for all layers of the trained deep neural network.
(3) Convolutional neural network
The convolutional neural network (convolutional neuron network, CNN) is a feed-forward neural network. Its artificial neuron can respond to a part of surrounding units in coverage area, and can be used for processing large-scale image. CNNs consist of one or more convolutional layers and a top fully connected layer (corresponding to a classical neural network) and also include associated weights and pooling layers (pooling layers), a structure that enables the convolutional neural network to utilize a two-dimensional structure of input data. CNNs can give better results in terms of image and speech recognition than other deep learning structures. This model may also be trained using a back propagation algorithm. Compared to other deep, feed-forward neural networks, CNNs require fewer parameters to estimate, making them an attractive deep learning structure.
(4) Loss function
In training the deep neural network, since the output of the deep neural network is expected to be as close to the value actually expected, the weight vector of each layer of the neural network can be updated by comparing the predicted value of the current network with the actually expected target value according to the difference between the predicted value of the current network and the actually expected target value (of course, there is usually an initialization process before the first update, that is, the pre-configuration parameters of each layer in the deep neural network), for example, if the predicted value of the network is higher, the weight vector is adjusted to be lower than the predicted value, and the adjustment is continuously performed until the deep neural network can predict the actually expected target value or the value very close to the actually expected target value. Thus, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which is a loss function (loss function) or an objective function (objective function), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, the higher the output value (loss) of the loss function is, the larger the difference is, and then the training of the deep neural network becomes a process of reducing the loss as much as possible.
(5) Observer network (spatial configuration-net, SCN)
Fig. 1 is a schematic diagram of an SCN according to an embodiment of the present application. The SCN is composed of a neural network, can extract characteristic information of an object in an image, and can be converted into image pixel coordinates. An image contains a lot of information, wherein part of the information is redundant, and only feature information of the image, such as global features of color, shape and texture of the image or features of local areas of edges, corner points and special areas of the image, is usually extracted.
Fig. 2 is a system architecture diagram for model training according to an embodiment of the present application.
Referring to fig. 2, the data collection device 260 is configured to collect training data and store the training data in the database 230, and the training device 220 is configured to train to obtain the target model/rule 201 based on the training data maintained in the database 230.
The training process of the neural network is essentially a way to learn and control the spatial transformation, more specifically to learn the weight matrix. Because the output of the deep neural network is expected to be as close to the value actually expected to be predicted as possible, the training of the target model/rule 201 can be completed by comparing the predicted value of the current network with the actually expected target value and updating the weight vector of each layer of neural network according to the difference condition between the predicted value and the actually expected target value until the difference between the predicted value and the target value output by the training device 220 is smaller than a certain threshold value.
The above-described object model/rules 201 can be used to implement the visual servoing method of an embodiment of the present application. The target model/rule 201 in the embodiment of the present application may be specifically a neural network model. It should be noted that, in practical applications, the training data maintained in the database 230 is not necessarily all acquired by the data acquisition device 260, but may be received from other devices. It should be further noted that the training device 220 is not necessarily completely based on the training data maintained by the database 230 to perform training of the target model/rule 201, and it is also possible to obtain the training data from the cloud or other places to perform model training, which should not be taken as a limitation of the embodiments of the present application.
The target model/rule 201 trained according to the training device 220 may be applied to different systems or devices, such as the execution device 210 shown in fig. 2.
The execution device 110 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an Augmented Reality (AR) AR/Virtual Reality (VR), a vehicle-mounted terminal, or a server or cloud, etc. In fig. 2, the execution device 210 configures an input/output (I/O) interface 212 for data interaction with an external device, and a user may input data to the I/O interface 212 through the client device 240, where the input data may include in an embodiment of the present application: initial image features and desired image features.
In preprocessing input data by the execution device 210, or in performing processing related to computation or the like by the computation module 211 of the execution device 210, the execution device 210 may call data, codes or the like in the data storage system 250 for corresponding processing, or may store data, instructions or the like obtained by corresponding processing in the data storage system 250.
Finally, the I/O interface 212 returns the processing results, such as the resulting target speed, to the client device 240 for presentation to the user.
It should be noted that the training device 220 may generate, based on different training data, a corresponding target model/rule 201 for different targets or different tasks, where the corresponding target model/rule 201 may be used to achieve the targets or to perform the tasks, thereby providing the user with the desired results.
In the case shown in FIG. 2, the user may manually give input data, which may be manipulated through an interface provided by the I/O interface 212. In another case, the client device 240 may automatically send the input data to the I/O interface 212, and if the client device 240 is required to automatically send the input data requiring the user's authorization, the user may set the corresponding permissions in the client device 240. The user may view the results output by the execution device 210 at the client device 240, and the specific presentation may be in the form of a display, a sound, an action, or the like. The client device 240 may also be used as a data collection terminal to collect input data from the input I/O interface 212 and output results from the output I/O interface 212 as new sample data, and store the new sample data in the database 230. Of course, the input data input to the I/O interface 112 and the output result output from the I/O interface 212 as shown in the figure may be stored as new sample data in the database 230 directly by the I/O interface 212 instead of being collected by the client device 240.
It should be noted that fig. 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among devices, apparatuses, modules, etc. shown in the drawing is not limited in any way, for example, in fig. 2, the data storage system 250 is an external memory with respect to the execution device 210, and in other cases, the data storage system 250 may be disposed in the execution device 210.
As shown in fig. 2, the target model/rule 201 is trained according to the training device 220, where the target model/rule 201 may be the first visual servo model or the second visual servo model in an embodiment of the present application.
The visual servo is a control mechanism for guiding the movement of the robot by taking visual information as feedback and obtaining geometric features from the visual information, and is widely applied to the fields of industrial assembly line production, aerospace and the like.
Visual servoing can be divided into IBVS and PBVS. IBVS is to directly compare the image signal measured in real time with the image information of the target position/posture, and perform closed-loop feedback control by using the obtained image error. The trajectory of the cartesian space generated by the IBVS controller is often very complex, especially in some application scenarios, such as unmanned aerial vehicle guiding landing with IBVS, or servo tracking with a VGA-mounted camera, where the camera start pose differs significantly from the target pose, which results in a more difficult prediction and danger of the curve in the 3D space, especially when the target object is at the edge of the camera field of view, and even a loss of feature points may occur, resulting in a servo failure.
The PBVS is to establish the mapping relation between the image signal and the position/posture of the robot by using the parameters of the camera, extract the position/posture information of the robot by means of the image signal in the servo process, and compare them with the target position/posture to form a closed loop feedback control. Compared with IBVS, PBVS takes the 3D relative pose of the target object and the camera as a visual feature, can walk a perfect straight line in cartesian space, and is globally asymptotically stable. In the servo process of the PBVS, the pose estimation of the visual features is required to depend on environment parameters such as internal parameters of the visual system and 3D models of objects, the environment parameters are difficult to be accurately measured in the actual use process, and errors of the environment parameters, particularly errors of the 3D models of the objects, can cause great reduction of the control accuracy of the PBVS.
At present, in order to improve the servo performance of visual servo, a plurality of sensors can be added into a robot vision system to supplement information collected by the robot vision system. For example, the distance measuring sensor and the visual sensor can be combined to realize mutual complementation of distance space information and image characteristic space information, so that the problems of multi-information fusion and information redundancy are solved, the robot can sense the environment more comprehensively, and accurate positioning of a target object is realized. Although the method improves the accurate positioning of the target object, the method needs to introduce a ranging sensor, and increases the computational complexity.
Improving servo performance by applying neural networks to robotic servo tasks is yet another way to improve visual servo performance. For example, a convolutional neural network is used to learn control of joint space to accomplish some complex positional tasks, reinforcement learning is used to learn 6-degree-of-freedom closed loop grabbing, and a micro convolutional neural network is used to estimate relative pose, etc. However, these learning-based methods have problems of poor interpretability, long training time, high data cost, difficulty in migration, and the like.
In view of the above problems, an embodiment of the present application provides a method for acquiring a visual servo model, where the method trains a neural network model at a speed obtained by image features and initial camera pose and desired camera pose corresponding to the image features, and acquires the visual servo model. In this way, the obtained visual servo model may be controlled as much as possible in a straight trajectory to control the robot movement to minimize the error between the initial image features and the desired image features. Thus, the efficiency and stability of visual servo are improved and the control performance of visual servo is improved while high servo precision is ensured.
Fig. 3 is a schematic structural diagram of a system 300 for obtaining a visual servo model according to an embodiment of the present application. The visual servo model provided by the embodiment of the application can be obtained through the system 300.
The system 300 for acquiring a visual servoing model includes a neural network controller 310 and a PBVS controller 320.
In one implementation, the neural network controller 310 may perform the following process:
acquiring a first image feature dataset comprising a first desired image feature and a plurality of first initial image features;
acquiring a first speed data set, wherein the speed of the first speed data set is determined according to the camera pose corresponding to the first expected image feature and the camera pose corresponding to the first initial image features, and the speed of the first speed data set corresponds to the first initial image features one by one;
the neural network model is trained based on the first image feature dataset and the first velocity dataset to obtain a first visual servo model.
In one implementation, the neural network controller 310 may also perform the following process:
acquiring a second image feature dataset comprising a second desired image feature and a plurality of second initial image features;
acquiring a second speed data set, wherein the speed in the second speed data set is determined according to the camera pose corresponding to the plurality of second expected image features and the camera pose corresponding to the plurality of second expected image features, and the speed in the second speed data set corresponds to the plurality of second initial image features one by one;
And updating the weight of the first visual servo model based on the second image characteristic data set and the second speed data set to obtain a second visual servo model.
In one implementation, PBVS controller 320 may determine and output a first set of speed data based on the camera pose corresponding to the first desired image feature and the camera poses corresponding to the plurality of first initial image features.
In one implementation, PBVS controller 320 may determine and output a second set of speed data based on the camera pose corresponding to the second desired image feature and the camera poses corresponding to the plurality of second initial image features.
The first speed data set may be a first camera speed data set or a first robot tip speed data set. The second speed data set may be a second camera speed data set or a second robot end speed data set.
The first visual servo model or the second visual servo model may be acquired through the process neural network controller 310 described above, and may be used for control of visual servo, etc.
Of course, the PBVS controller may be given other names that enable the same or similar functions as the PBVS controller, i.e. controlling the movement of the robot based on the difference between the initial camera pose and the desired camera pose. The application is not limited in this regard.
Fig. 4 is a flowchart of a method for obtaining a visual servoing model according to an embodiment of the present application.
S410, acquiring a first image characteristic data set.
The first image feature data set includes a first desired image feature and a plurality of first initial image features. The first desired characteristic image feature may be an image feature in the first desired camera pose.
In some embodiments, the image features of the object may be acquired using a trained neural network, such as SCN.
In some embodiments, computer vision methods may be utilized to obtain image features of an object.
S420, acquiring a first data set.
The speed of the first speed data set is determined according to the camera pose corresponding to the first expected image feature and the camera pose corresponding to the first initial image features, and the speed of the first speed data set corresponds to the first initial image features one by one.
The first speed data set has a speed corresponding to a plurality of first initial image features one by one, and it can be understood that when the first expected image feature and a certain first initial image feature are used as inputs of the neural network, the speed determined by the camera pose corresponding to the first expected image feature and the camera pose corresponding to the first initial image feature is used as a target output, and the neural network model is trained.
In some implementations, the first speed data set may be a first camera speed data set. For example, the PBVS controller may determine the speed of the camera from the camera pose corresponding to the first desired image feature and the camera poses corresponding to the first initial image features.
In some embodiments, the first speed data set may be a first robot end speed data set. For example, the PBVS controller may obtain the velocity of the robotic arm tip from the camera pose corresponding to the first desired image feature and the camera poses corresponding to the first plurality of initial image features.
In some implementations, the corresponding camera pose may be determined from the image information. For example, the camera pose may be determined from the image information by simulation, or may be determined from the image information by pnp+extended EKF.
In some embodiments, the camera pose at the time of capturing the image may be determined by a robotic system. For example, a camera may capture image information at a certain camera pose, which may be acquired by a robotic system.
And S430, training the neural network model based on the first image characteristic data set and the first speed data set to obtain a first visual servo model.
In the training of the neural network model, a loss function, such as a mean square error loss function, an exponential loss function, or a square loss function, may be used to measure the difference between the actual output value of the neural network model (or the value of the actual output value of the neural network after the speed conversion) and the target value (the speed of the first speed data set). For example, if the predicted value of the neural network model is higher, the weight vector is adjusted to be less predicted, and the adjustment is continued until the deep neural network can predict a value very close to or equal to the target value, or until the output value of the neural network is very close to or equal to the target value after the speed conversion.
In some embodiments, the neural network model is trained with a first feature data set as input and a first speed data set as output to obtain a first visual servo model, wherein the first speed data set may be a first camera speed data set or a first end-of-arm speed numberA data set. For example, pixel coordinates of a first initial image featureThe pixel coordinates of the first desired image feature are +.>The current image feature and the desired image feature can be scaled by a normalization factor k to obtain +. >As input to a neural network model; and taking the speed (the speed of the camera or the speed of the tail end of the mechanical arm) determined by the PBVS controller according to the camera pose corresponding to the first initial image feature and the camera pose corresponding to the first expected image feature as the target output of the neural network model, and training the neural network model to obtain a first visual servo model. As such, the output of the first visual servo model may be the speed of the camera or the speed of the robot tip.
In some embodiments, the first image feature dataset may be input to a neural network model, outputting a predicted camera speed dataset. And converting the first predicted camera speed data set into a first predicted mechanical arm tail end speed data set according to a first initial hand-eye conversion function, wherein the first initial hand-eye conversion function comprises a first initial camera external parameter, and the first initial camera external parameter can be obtained through calibration. The neural network model is then trained based on the first predicted robot tip speed dataset and a first speed dataset, which may be the first robot tip speed, to obtain a first visual servo model. As such, the output of the first visual servo model may be the speed of the camera.
Optionally, the first initial camera profile may also be adjusted based on the first predicted arm tip speed data set and the first speed data set (first arm tip speed data set) to obtain a first updated camera profile.
If the output of the first visual servo model is the speed of the camera, the speed can be converted by the formula (1), and the speed of the camera under the camera coordinate system is converted into the speed of the tail end of the mechanical arm under the mechanical arm base coordinate system.
Wherein, b v tcp for the speed of the end of the mechanical arm,for the speed of the camera, b R tcp is the pose of the mechanical arm, tcp p c is a camera external parameter, f vt () Is a hand-eye transfer function.
The camera profile may be the first initial camera profile (calibrated camera profile) or may be the first updated camera profile when the speed transition is made.
The velocity of the end of the arm may include linear velocity and angular velocity. Further, the linear velocity and the angular velocity of the arm tip can be obtained by the formulas (2) to (9).
The linear and angular velocities of the new camera coordinate system relative to the robot base coordinate system can be calculated by equations (2) and (3).
Wherein,for the linear speed of the original camera coordinate system relative to the robot base coordinate system, +. >Relative to the machine for the original camera coordinate systemDerivative of the rotation matrix of the arm base coordinate system, < >>For the rotation matrix of the original camera coordinate system relative to the manipulator base coordinate system, +.>For a translation matrix of the new camera coordinate system relative to the original camera coordinate system,/for the translation matrix of the new camera coordinate system relative to the original camera coordinate system>Is the linear velocity of the camera in the original camera coordinate system. />For the angular velocity of the original camera coordinate system relative to the robot base coordinate system +.>Is the angular velocity of the camera in the original camera coordinate system.
Since the linear and angular velocities of the original camera coordinate system with respect to the robot base coordinate system are both 0. Therefore, the linear velocity and the angular velocity of the new camera coordinate system with respect to the robot arm base coordinate system are shown in the formula (4) and the formula (5), respectively.
The linear velocity and the angular velocity of the end effector in the robot arm base coordinate system can be calculated by the formula (6) and the formula (7), respectively.
b w tcpb w tcp + b R tcp c w tcp Formula (7)
Since the robot arm tip is rigidly connected to the camera, the camera linear and angular velocities under the current camera coordinate system are 0. Substituting equation (4) and equation (5) into equation (6) and equation (7) respectively can obtain the linear velocity and angular velocity of the robot arm end in the robot arm base coordinate system, as shown in equation (8) and equation (9).
Wherein,wherein (1)>For the derivative of the rotation matrix of the original camera coordinate system with respect to the robot base coordinate system, +.>Is a rotation matrix of the original camera coordinate system relative to the manipulator base coordinate system, c T tcp for a translation matrix of the tool coordinate system relative to the original camera coordinate system,>for the linear velocity of the camera in the original camera coordinate system,/->Is the angular velocity of the camera in the original camera coordinate system.
Fig. 5 is a schematic diagram of a method for obtaining a visual servo model according to an embodiment of the present application.
When the first visual servo model is applied to a use environment where certain differences may exist in environmental parameters such as camera internal parameters, camera external parameters, model 3D, etc., the output result of the first visual servo model may be less than ideal due to the differences in the environmental parameters. For example, when the first visual servo model obtained through simulation model training is applied to an actual use environment, there may be an error in the output result of the first visual servo model due to a difference in environmental parameters. For another example, when the first visual servo model obtained through the real machine test is applied to a use environment with a certain difference in environmental parameters such as another camera internal parameter, a camera external parameter, a model 3D, etc., an error may exist in the result output by the first visual servo model due to the difference in the environmental parameters. Therefore, in this case, if the output speed of the model is more accurate, the weight of the first visual servo model may be updated to obtain the second visual servo model. The method for acquiring the second visual servo model comprises the following steps:
S510, acquiring a second image characteristic data set.
The second image feature data set includes a second desired image feature and a plurality of second initial image features.
S520, acquiring a second speed data set.
The speed in the second speed data set is determined according to the camera pose corresponding to the plurality of second expected image features and the camera pose corresponding to the plurality of second expected image features, and the speed in the second speed data set corresponds to the plurality of second initial image features one to one.
For example, the second set of speed data may be determined by the PBVS controller based on camera poses corresponding to the plurality of second initial image features and camera poses corresponding to the second desired image features.
The second speed data set has one-to-one correspondence with a plurality of second initial image features, which can be understood as taking the second desired image feature and a certain second initial image feature as the input of the first visual servo model, and taking the speed determined by the camera pose corresponding to the second desired image feature and the camera pose corresponding to the second initial image feature as the target output of the first visual servo model. Or, the second desired image feature and a certain second initial image feature are used as inputs of the first visual servo model to obtain a predicted value, the predicted value is subjected to post-processing (for example, coordinate conversion) to obtain a post-processed predicted value (coordinate conversion value), and the speed determined by the camera pose corresponding to the second desired image feature and the camera pose corresponding to the second initial image feature is used as a target value of the coordinate conversion value.
In some embodiments, the corresponding camera pose may be acquired through image information. For example, the camera pose may be determined from the image information by simulation, or may be determined from the image information by pnp+extended EKF.
In some embodiments, the camera pose at the time of capturing the image may be determined by a robotic system. For example, a camera may capture image information at a certain camera pose, which may be acquired by a robotic system.
And S530, updating the weight of the first visual servo model based on the second image characteristic data set and the second speed data set to obtain a second visual servo model.
In some embodiments, the output of the first visual servo model is a speed of the camera and the second speed dataset is a second robot tip speed dataset. The second image feature dataset may be input to the first visual servo model, outputting a second predicted camera speed dataset; converting the second predicted camera speed dataset into a second predicted robot end speed dataset according to a second initial hand-eye conversion function, the second initial hand-eye conversion function comprising a second initial camera profile; and updating the weight of the first visual servo model based on the second prediction mechanical arm tail end speed data set and the second speed data set to obtain a second visual servo model.
Optionally, the second initial camera profile may be updated based on the second predicted robot end speed data set and the second speed data set (second robot end speed data set) to obtain a second updated camera profile.
The errors of the 3D model of the camera reference and object affect the image features and the final effect is the output of the model, i.e. the output of the first visual servo model has errors. The error of the camera external parameters influences the process of converting the speed of the tail end of the mechanical arm by the camera speed, and finally influences the speed of the tail end of the mechanical arm.
Errors in the 3D models of the camera internal, external, and object result in errors in the camera speed output by the first visual servo model and the converted robot tip speed, such that the converted robot tip speedMechanical arm end speed output by PBVS controller b V tcp PBVS There is a difference as shown in formula (10).
Wherein,camera speed output for the first visual servo model, < >>As the camera external parameters with errors, the camera external parameters with errors can also be called as second initial camera external parameters, can be obtained through calibration, can also be first updated camera external parameters, or can also be first initial camera external parameters.
The converted tail end speed of the mechanical arm has errors, the generated motion track may not be a straight line, the path length is increased, and the convergence speed is reduced.
For example, the difference between the converted robot tip speed and the robot tip speed output by the PBVS controller may be measured using the mean square error loss function shown in equation (11), thereby updating the weight of the first visual servo model and the camera outlier.
And updating the weight of the first visual servo model and the camera external parameters through a formula (11), so that the corrected tail end speed of the mechanical arm can be obtained, as shown in a formula (12).
Wherein,the camera speed of the first visual servo model output after updating the weight, < >>Is the updated camera external parameter (second updated camera external parameter).
Obtained by conversion of equation (13) and equation (14), camera parameters assuming errors exist(second initial camera external parameters) and updated camera external parameters (second updated camera external parameters) tcp P c With delta P between dif As shown in equation (13).
For ease of calculation, ΔP may be calculated by equation (14) dif Is converted into a lie algebra by logarithmic mapping.
ξ dif =f log (ΔP dif ) Formula (14)
Wherein, xi dif Is the camera external parameter error under the lie algebra ,f log () Is a logarithmic mapping function.
In some embodiments, the output of the first visual servoing model may be a speed of the robot tip and the second speed dataset may be a second robot tip speed dataset. The second image feature dataset may be input to the first visual servo model, outputting a second predicted robotic arm speed dataset; and updating the weight of the first visual servo model based on the second prediction mechanical arm tail end speed data set and the second speed data set to obtain a second visual servo model.
In some embodiments, the output of the first visual servo model may be a speed of the camera and the second set of speed data may be a second set of camera speed data. The second image feature dataset may be input to the first visual servo model, outputting a second predicted camera speed dataset. And then updating the weight of the first visual servo model based on the second predicted camera speed data set and the second speed data set to obtain a second visual servo model.
Fig. 6 is a schematic flow chart of a visual servoing method according to an embodiment of the application.
S610, acquiring target data.
The target data includes initial image features and desired image features.
S620, inputting the target data into the first visual servo model or the second visual servo model, and outputting the target speed.
The target speed may be a target camera speed or a target robot end speed.
In some embodiments, the first visual servoing model is obtained by training the neural network model based on a first image feature dataset comprising a first desired image feature and a plurality of first initial image features and a first velocity dataset whose velocity is determined from a camera pose corresponding to the first desired image feature and a camera pose corresponding to the plurality of first initial image features, the velocity of the first velocity dataset being in one-to-one correspondence with the plurality of first initial image features.
In some embodiments, the second visual servoing model is obtained based on a second image feature dataset comprising a second desired image feature and a plurality of second initial image features and a second speed dataset determined from a camera pose corresponding to the second desired image feature and a camera pose corresponding to the plurality of second initial image features, the speeds in the second speed dataset being in one-to-one correspondence with the plurality of second initial image features.
In some embodiments, the first speed data set is a first camera speed data set or a first robotic arm tip speed data set. The first visual servo model is obtained by training the neural network model by taking the first image characteristic data set as an input and taking the first speed data set as an output.
In some embodiments, the first speed data set is a first robotic arm tip speed data set. The first visual servoing model is to input the first image feature dataset into the neural network model, outputting the first predicted camera speed dataset. The first predicted camera speed dataset is converted into a first predicted robot end speed dataset according to a first initial hand-eye conversion function, the first initial hand-eye conversion function including a first initial camera profile. And adjusting the weight of the neural network model based on the first predicted mechanical arm end speed data set and the first speed data set.
In some embodiments, the output of the first visual servo model is the speed of the camera and the second speed dataset is the second robotic arm tip speed dataset; the second visual servoing model is to input the second image feature dataset into the first visual servoing model and output the second predicted camera speed dataset. And converting the second predicted camera speed data set into a second predicted mechanical arm tail end speed data set according to a second initial hand-eye conversion function, wherein the second initial hand-eye conversion function comprises a second initial camera external parameter. Updating the weights of the first visual servo model based on the second predicted robot tip speed dataset and the second speed dataset.
In some embodiments, the output of the first visual servoing model is a speed of the robot tip and the second speed dataset is a second robot tip speed dataset. The second visual servo model is used for inputting the second image characteristic data set into the first visual servo model and outputting a second prediction mechanical arm tail end speed data set. And updating the weight of the first visual servo model according to the second prediction mechanical arm tail end speed data set and the second speed data.
In some embodiments, the output of the first visual servo model may be a speed of the camera and the second set of speed data may be a second set of camera speed data. The second visual servoing model is configured to input the second image feature dataset to the first visual servoing model and output the second predicted camera speed dataset. And then updating the weight of the first visual servo model based on the second predicted camera speed data set and the second speed data set.
And S630, controlling the robot to move according to the initial image characteristics, the expected image characteristics and the target speed.
If the target speed is the target camera speed, the target camera speed may be converted to a target robot tip speed according to equation (15).
Wherein,for the target arm tip speed, < >>A target camera speed output for the first visual servo model or the second visual servo model, b R tcp for the pose of the mechanical arm, the user is in the->Is an initial camera external parameter (a first initial camera external parameter or a second initial camera external parameter) or an updated camera external parameter (a second initial camera external parameter or a second updated camera external parameter)Ginseng), f vt () Is a hand-eye transfer function.
For example, if the target speed is the target camera speed, a first updated hand-eye transfer function may be further obtained, where the first updated hand-eye transfer function includes a first updated camera profile obtained after updating the first initial camera profile based on the first predicted robot end speed dataset and the first speed dataset. The target camera speed is converted to a target robot tip speed according to a first updated hand-eye conversion function. And controlling the robot to move according to the initial image characteristics, the expected image characteristics and the tail end speed of the target mechanical arm.
For example, if the target speed is the target camera speed, a second updated hand-eye transfer function may be further obtained, where the second updated hand-eye transfer function includes a second updated camera profile obtained after updating the second initial camera profile based on the second predicted robot end speed dataset and the second speed dataset. And converting the target camera speed into the target mechanical arm tail end speed according to the second updated hand-eye conversion function. The robot movement is then controlled based on the initial image characteristics, the desired image characteristics, and the target robot arm tip speed.
FIG. 7 is a schematic diagram of a visual servo model and servo tracks of a conventional IBVS controller according to an embodiment of the present application.
Under the condition that the environmental conditions such as the camera internal parameter, the camera external parameter and the 3D model of the target are known, the track of the visual servo model in the 3D space is closer to a straight line, and the control performance of the visual servo model is better than that of a traditional IBVS controller.
Fig. 7 (a) and fig. 7 (b) are respectively a visual servo model and a servo result of the IBVS controller control camera successfully reaching a desired pose. The servo track of the visual servo model provided by the application is closer to a straight line, and the servo track of the IBVS controller is closer to a curve.
Fig. 7 (c) and fig. 7 (d) are the servo results of the visual servo model and IBVS controller provided by the present application to control the camera to the desired camera pose, respectively. The visual servo model provided by the application has a servo track which is basically a straight line and controls the camera to successfully reach the expected camera pose. IBVS controllers may cause servo failure due to the movement of the feature points out of the camera's field of view, such that the camera fails to reach the desired camera pose.
Therefore, compared with the IBVS controller, the servo track of the visual servo model provided by the embodiment of the application is closer to a straight line, so that the characteristic points are in the visual field range of the camera, and the probability of successful servo is improved.
FIG. 8 is a schematic diagram of the servo performance of a visual servo model and a conventional IBVS controller provided by an embodiment of the present application.
A random parameter d may be superimposed at a position 0.3 meters (meter, m) directly above the desired camera pose (camera pose corresponding to the desired image feature) as the initial camera pose (camera pose corresponding to the initial image feature). I.e., (d+0.3) m is used as the initial camera pose, and d is more than or equal to 0 and less than or equal to (h-0.3), wherein h is the maximum height which can be reached by the camera pose. The initial camera pose distance expects the average distance of the camera poses to increase with increasing d.
The servo success rate can be used as one of the parameters for measuring the servo performance of the visual servo model and the IBVS controller provided by the embodiment of the present application. For example, a successful servo criterion may be: the radian of the deflection angle of the camera pose and the expected camera pose when the servo is completed is smaller than 0.017 (1 DEG), and the distance between the camera pose and the expected camera pose when the servo is completed is smaller than Fig. 8 (a) shows the servo success rate of the visual servo model and IBVS controller provided by the present application under different initial camera poses. Under different initial camera positions, the servo success rate of the visual servo model provided by the application is basically kept at 100%; while the servo success rate of IBVS controllers gradually decreases from 60% to 20% as the average distance of the initial camera pose from the desired camera pose increases.
The average step size may also be used as a further parameter for measuring the servo performance of the visual servo model and IBVS controller provided by the present application. Fig. 8 (b) shows the average step size of the visual servo model and IBVS controller provided by the present application at different initial camera poses. As the average distance of the initial camera pose from the desired camera pose increases, the average step size of the visual servo model controller provided by the present application remains substantially near 100. While the average step size of the IBVS controller gradually increases from 310 to approximately 400 as the average distance of the initial camera pose from the desired camera pose increases. Therefore, the average step length of the visual servo model controller provided by the application is far smaller than that of the IBVS controller. The motion trail of the visual servo model provided by the application is closer to a straight line, and the motion trail of the IBVS controller is a curve. Thus, the visual servo model controller provided by the application can shorten the servo time and avoid the problem of losing the characteristic information.
Fig. 8 (c) shows the deflection angles of the camera pose and the expected camera pose when the visual servo model and the IBVS controller provided by the application are used for servo finishing under different initial camera poses. Under different initial camera positions, the deflection angles (radians) of the camera positions and the expected camera positions at the end of the servo of the visual servo model are basically within 0.01, and the deflection angles of the camera positions and the expected camera positions at the end of the servo of the IBVS controller are between 0.02 and 0.03. That is, under different initial camera poses, the deflection angle of the camera pose at the end of the visual servo model servo and the expected camera pose provided by the application is smaller than the deflection angle of the IBVS controller.
Fig. 8 (d) shows distances between the terminal camera pose and the expected camera pose of the visual servo model controller and the IBVS controller provided by the present application under different initial camera poses. Under different initial camera poses, the average distance between the camera pose at the end of the servo of the visual servo model controller and the expected camera pose is 0.001m, and the average distance between the terminal camera pose under the IBVS controller and the expected camera pose is between 0.003 and 0.005. That is, the average distance between the camera pose of which the visual servo model is finished and the expected camera pose is smaller than the distance under the IBVS controller under different initial camera poses.
As the average distance between the initial camera pose and the desired camera pose increases, the deflection angle and average distance of the camera pose at the end of the IBVS controller servo from the desired camera pose show a decreasing trend. This is because the allowable pose of a successful servo constitutes a region, and the motion track of the IBVS controller in space is a curve, resulting in a relatively large pose deviation when entering the region. As the average distance of the initial camera pose from the desired camera pose increases, the error in the camera pose at the end of the servo to be able to successfully enter the region gradually decreases.
FIG. 9 is a schematic diagram of the servo performance of a visual servo model and a conventional PBVS controller provided by an embodiment of the present application.
The servo success rate can be used as one of the parameters for measuring the servo performance of the visual servo model and the PBVS controller provided by the embodiment of the application. For example, a successful servo criterion may be: the radian of the deflection angle of the camera pose and the expected camera pose when the servo is completed is smaller than 0.017 (1 DEG), and the distance between the camera pose and the expected camera pose when the servo is completed is smaller thanFig. 9 (a) shows the servo success rate of the visual servo model and the PBVS controller provided by the application under the object 3D model with different errors. Along with the gradual increase of the error of the 3D model of the object, the servo success rate of the visual servo model and the servo success rate of the PBVS controller provided by the application are basically kept at 100%.
The average step size can also be used as a further parameter for measuring the servo performance of the visual servo model and the PBVS controller provided by the present application. Fig. 9 (b) shows the average step sizes of the visual servo model and the PBVS controller provided by the application under the object 3D model with different errors. With the gradual increase of the error of the 3D model of the object, the average step length of the visual servo model and the average step length of the PBVS controller provided by the application are between 100 and 125.
Fig. 9 (c) shows deflection angles of the camera pose and the expected camera pose of the visual servo model and the 3D model of the object with different errors by the PBVS controller at the end of the servo. With the increase of the error of the 3D model of the object, the deflection angles (radians) of the camera pose and the expected camera pose at the end of the servo of the visual servo model are basically within 0.01, and the deflection angles of the camera pose and the expected camera pose at the end of the servo of the PBVS controller are gradually increased to more than 0.04.
Fig. 9 (D) shows the distances between the camera pose and the expected camera pose of the visual servo model controller and the PBVS controller at the end of the 3D model servo of the object with different errors. With the gradual increase of the error of the 3D model of the object, the average distance between the camera pose and the expected camera pose at the end of the servo of the visual servo model controller provided by the application is not more than 0.002m, and the average distance between the camera pose and the expected camera pose at the end of the servo of the PBVS controller is gradually increased to 0.006m. That is, as the error of the 3D model of the object increases gradually, the average distance between the camera pose of the visual servo model servo end provided by the application and the expected camera pose is basically smaller than the distance of the PBVS controller.
As can be seen from fig. 9, the difference between the servo success rate and the average step size of the visual servo controller provided by the present application and the conventional PBVS controller is small, and the track of the visual servo controller provided by the present application in the 3D space may be closer to a straight track. When the error of the 3D model of the object is gradually increased, the average deflection angle and the average distance between the camera pose and the expected camera pose at the end of the servo of the visual servo controller are smaller, and in most cases, the average deflection angle and the average distance are smaller than those of the PBVS controller. Along with the increase of the error of the 3D model of the object, the change of the average deflection angle and the average distance between the camera pose and the expected camera pose at the end of the servo of the visual servo controller provided by the application is smaller, and the robustness of the change of the 3D model of the object with different errors is good; as the error of the 3D model of the object increases, the average deflection angle and average distance of the PBVS controller greatly increase, and the control accuracy decreases.
An embodiment of the device of the present application will be described in detail below with reference to fig. 10 to 13. It should be understood that the apparatus in the embodiments of the present application may perform the method in the foregoing embodiments of the present application, that is, specific working procedures of various products may refer to corresponding procedures in the foregoing method embodiments.
Fig. 10 is a schematic diagram of an apparatus 1000 for obtaining a visual servo model according to an embodiment of the present application.
It should be appreciated that the apparatus 1000 may perform the method of acquiring the visual servoing model of fig. 4 or 5. For example, the apparatus 1000 may be the training device 220 of fig. 2, or the neural network controller 310 of fig. 3. The apparatus 1000 comprises: an acquisition unit 1010 and a processing unit 1020.
Wherein the acquiring unit 1010 is configured to acquire a first image feature data set, where the first image feature data set includes a first desired image feature and a plurality of first initial image features; an acquiring unit 1010, configured to acquire a first speed data set, where a speed of the first speed data set is determined according to a camera pose corresponding to a first desired image feature and a camera pose corresponding to a plurality of first initial image features, and the speed of the first speed data set corresponds to the plurality of first initial image features one to one; the processing unit 1020 is configured to train the neural network model based on the first image feature data set and the first velocity data set to obtain a first visual servo model.
Optionally, as an embodiment, the first speed dataset is a first camera speed dataset or a first robot end speed dataset; the processing unit 1020 is configured to: the neural network model is trained with the first image feature data set as input and the first speed data set as output to obtain a first visual servo model.
Optionally, as an embodiment, the first speed dataset is a first robot end speed dataset; the processing unit 1020 is configured to: inputting the first image feature dataset into a neural network model, outputting a first predicted camera speed dataset; converting the first predicted camera speed dataset into a first predicted robotic arm tip speed according to a first initial hand-eye transfer function, the first initial hand-eye transfer function comprising a first initial camera profile; and adjusting the weight of the neural network model based on the first prediction mechanical arm tail end speed data set and the first speed data set to obtain a first visual servo model.
Optionally, as an embodiment, the obtaining unit 1010 is configured to obtain a second image feature data set, where the second image feature data set includes a second desired image feature and a plurality of second initial image features; an acquiring unit 1010, configured to acquire a second data set, where a speed in the second data set is determined according to a camera pose corresponding to a second desired image feature and a camera pose corresponding to a plurality of second initial image features, and the speed in the second data set corresponds to the plurality of second initial image features one to one; and a processing unit 1020 configured to update the weights of the first visual servoing model based on the second image feature data set and the second speed data set to obtain a second visual servoing model.
Optionally, as an embodiment, the output of the first visual servo model is a speed of the camera and the second speed dataset is a second robot end speed dataset; the processing unit 1020 is specifically further configured to: inputting the second image feature dataset into the first visual servo model, outputting a second predicted camera speed dataset; converting the second predicted camera speed dataset into a second predicted robot end speed dataset according to a second initial hand-eye conversion function, the second initial hand-eye conversion function comprising a second initial camera profile; and updating the weight of the first visual servo model based on the second prediction mechanical arm tail end speed data set and the second speed data set to obtain a second visual servo model.
Optionally, as an embodiment, the output of the first visual servo model is a speed of the end of the arm, and the second speed dataset is a second speed dataset of the end of the arm; the processing unit 1020 is specifically further configured to: inputting the second image characteristic data set into the first visual servo model, and outputting a second prediction mechanical arm tail end speed data set; and updating the weight of the first visual servo model based on the second prediction mechanical arm tail end speed data set and the second speed data set to obtain a second visual servo model.
Fig. 11 is a schematic diagram of a visual servoing device 1100 according to an embodiment of the application. It should be appreciated that the apparatus 1100 may be used to perform the visual servoing method of fig. 6. The apparatus 1100 may include an acquisition unit 1110 and a processing unit 1120.
Wherein, the acquiring unit 1110 is configured to acquire target data, where the target data includes an initial image feature and a desired image feature; the processing unit 1120 is configured to input target data into a first visual servo model or a second visual servo model, and output a target speed, where the target speed is a camera speed or a robot arm end speed, the first visual servo model is obtained by training a neural network model based on a first image feature dataset and a first speed dataset, the first image feature dataset includes a first desired image feature and a plurality of first initial image features, the speed of the first speed dataset is determined according to a camera pose corresponding to the first desired image feature and a camera pose corresponding to the plurality of first initial image features, the speed of the first speed dataset is one-to-one corresponding to the plurality of first initial image features, the second visual servo model is obtained by updating weights of the first visual servo model based on a second image feature dataset and a second speed dataset, the second image feature dataset includes a second desired image feature and a plurality of second initial image features, and the speed in the second speed dataset is determined according to a camera pose corresponding to the second desired image feature and a plurality of second initial image features, and the speed in the second speed dataset is one-to-one corresponding to-second initial image features; a processing unit 1120 for controlling the robot movement according to the initial image characteristic, the desired image characteristic and the target speed.
Optionally, as an embodiment, the first speed dataset is a first camera speed dataset or a first robot end speed dataset; the first visual servoing model is obtained by training the neural network model based on the first image feature dataset and the first speed dataset, comprising: the first visual servo model is obtained by training the neural network model by taking the first image characteristic data set as an input and taking the first speed data set as an output.
Optionally, as an embodiment, the first speed dataset is a first robot end speed dataset; the first visual servoing model is obtained by training the neural network model based on the first image feature dataset and the first speed dataset, comprising: the first visual servo model is used for inputting a first image characteristic data set into the neural network model and outputting a first predicted camera speed data set; converting the first predicted camera speed dataset into a first predicted robot end speed dataset according to a first initial hand-eye conversion function, the first initial hand-eye conversion function comprising a first initial camera profile; and adjusting the weight of the neural network model based on the first predicted mechanical arm end speed data set and the first speed data set.
Alternatively, as an embodiment, if the target speed is the target camera speed; the acquisition unit 1110 is further configured to: acquiring a first updated hand-eye transfer function, wherein the first updated hand-eye transfer function comprises a first updated camera external parameter, and the first updated camera external parameter is obtained after updating a first initial camera external parameter based on a first prediction mechanical arm tail end speed data set and a first speed data set; the processing unit 1120 is configured to: converting the target camera speed into a target mechanical arm tail end speed according to a first updated hand-eye conversion function; and controlling the robot to move according to the initial image characteristics, the expected image characteristics and the tail end speed of the target mechanical arm.
Optionally, as an embodiment, the output of the first visual servo model is a speed of the camera and the second speed dataset is a second robot end speed dataset; the second visual servoing model is obtained based on the second image feature dataset and the second speed dataset updating weights of the first visual servoing model, comprising: the second visual servo model inputs the second image characteristic data set into the first visual servo model and outputs a second predicted camera speed data set; converting the second predicted camera speed dataset into a second predicted robot end speed dataset according to a second initial hand-eye conversion function, the second initial hand-eye conversion function comprising a second initial camera profile; updating the weights of the first visual servo model based on the second predicted robot tip speed dataset and the second speed dataset.
Alternatively, as an embodiment, if the target speed is the target camera speed; the acquisition unit 1110 is further configured to: acquiring a second updated hand-eye transfer function, wherein the second updated hand-eye transfer function comprises a second updated camera external parameter, and the second updated camera external parameter is obtained after updating a second initial camera external parameter based on a second predicted mechanical arm tail end speed data set and a second speed data set; the processing unit 1120 is configured to: converting the target camera speed into a target mechanical arm tail end speed according to a second updated hand-eye conversion function; and controlling the robot to move according to the initial image characteristics, the expected image characteristics and the tail end speed of the target mechanical arm.
Optionally, as an embodiment, the output of the first visual servo model is a speed of the end of the arm, and the second speed dataset is a second speed dataset of the end of the arm; the second visual servoing model is obtained based on the second image feature dataset and the second speed dataset updating weights of the first visual servoing model, comprising: the second visual servo model inputs a second image characteristic data set into the first visual servo model and outputs a second prediction mechanical arm tail end speed data set; and updating the weight of the first visual servo model according to the second prediction mechanical arm tail end speed data set and the second speed data set.
The above-described apparatus 1000 and apparatus 1100 are embodied in the form of functional units. The term "unit" herein may be implemented in software and/or hardware, without specific limitation.
For example, a "unit" may be a software program, a hardware circuit or a combination of both that implements the functions described above. The hardware circuitry may include application specific integrated circuits (application specific integrated circuit, ASICs), electronic circuits, processors (e.g., shared, proprietary, or group processors, etc.) and memory for executing one or more software or firmware programs, merged logic circuits, and/or other suitable components that support the described functions.
Thus, the elements of the examples described in the embodiments of the present application can be implemented in electronic hardware, or in a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
Fig. 12 is a schematic hardware structure of an apparatus for obtaining a visual servo model according to an embodiment of the present application. The apparatus 3000 for acquiring a visual servoing model shown in fig. 12 (the apparatus 3000 may be a computer device specifically) includes a memory 3001, a processor 3002, a communication interface 3003, and a bus 3004. The memory 3001, the processor 3002, and the communication interface 3003 are connected to each other by a bus 3004. For example, the apparatus 3000 may be the training device 220 of fig. 2, or the neural network controller 310 of fig. 3.
The memory 3001 may be a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access memory (random access memory, RAM). The memory 3001 may store a program that, when executed by the processor 3002, the processor 3002 is operative to perform the various steps of the method of obtaining a visual servoing model in accordance with an embodiment of the present application. Illustratively, the processor 3002 may perform steps S410 through S430 in the method shown in fig. 4 above or steps S510 through S530 in the method shown in fig. 5.
The processor 3002 may employ a general-purpose central processing unit (central processing unit, CPU), microprocessor, application specific integrated circuit (application specific integrated circuit, ASIC), graphics processor (graphics processing unit, GPU) or one or more integrated circuits for executing associated programs to implement the methods of obtaining visual servoing models of embodiments of the present application.
The processor 3002 may also be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the method of obtaining a visual servoing model of the present application may be performed by instructions in the form of integrated logic circuitry of hardware or software in the processor 3002.
The processor 3002 may also be a general purpose processor, a digital signal processor (digital signal processing, DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 3001, and the processor 3002 reads information in the memory 3001, and in combination with hardware thereof, performs functions required to be performed by units included in the apparatus for acquiring a visual servo model according to the embodiment of the present application, or performs the method for acquiring a visual servo model according to the embodiment of the present application.
The communication interface 3003 enables communications between the apparatus 3000 and other devices or communication networks using a transceiving apparatus such as, but not limited to, a transceiver. For example, the first image feature data set, the first speed data set, the second image feature data set, the second speed data set, or the like may be acquired through the communication interface 3003.
A bus 3004 may include a path to transfer information between various components of the device 3000 (e.g., memory 3001, processor 3002, communication interface 3003).
Fig. 13 is a schematic hardware structure of a visual servoing device according to an embodiment of the application. The visual servoing device 4000 shown in fig. 13 includes a memory 4001, a processor 4002, a communication interface 4003, and a bus 4004. The memory 4001, the processor 4002 and the communication interface 4003 are connected to each other by a bus 4004.
The memory 4001 may be a ROM, a static storage device, and a RAM. The memory 4001 may store programs, and when the programs stored in the memory 4001 are executed by the processor 4002, the processor 4002 and the communication interface 4003 are used to perform the respective steps of the visual servoing method of an embodiment of the present application. Specifically, the processor 4002 may perform step S610 to step S630 in the method shown in fig. 6 above.
The processor 4002 may employ a general-purpose CPU, microprocessor, ASIC, GPU, or one or more integrated circuits for executing associated programs to perform the functions required by the elements in the visual servoing device of an embodiment of the application, or to perform the visual servoing method of a method embodiment of the application.
The processor 4002 may also be an integrated circuit chip having signal processing capabilities, for example, the chip shown in fig. 5. In implementation, various steps of the visual servoing method of an embodiment of the application may be performed by instructions in the form of integrated logic circuitry of hardware or software in the processor 4002.
The processor 4002 may also be a general purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 4001, and the processor 4002 reads information in the memory 4001, and in combination with hardware thereof, performs functions to be executed by units included in the visual servoing device of an embodiment of the present application, or performs the visual servoing method of an embodiment of the present application.
The communication interface 4003 enables communication between the apparatus 4000 and other devices or communication networks using a transceiving apparatus such as, but not limited to, a transceiver. For example, the target data may be acquired through the communication interface 4003.
Bus 4004 may include a path for transferring information between various components of device 4000 (e.g., memory 4001, processor 4002, communication interface 4003).
It should be noted that although the above-described apparatus 3000 and apparatus 4000 only show a memory, a processor, a communication interface, in a specific implementation, those skilled in the art will appreciate that the apparatus 3000, 4000 may also include other devices necessary to achieve normal operation. Also, those skilled in the art will appreciate that the apparatus 3000, 4000 may also include hardware devices that implement other additional functions, as desired. Furthermore, those skilled in the art will appreciate that the apparatus 3000, 4000 may also include only the necessary devices to implement the embodiments of the present application, and not all of the devices shown in fig. 12 and 13.
It is to be appreciated that the processor in embodiments of the application may be a central processing unit (central processing unit, CPU), but may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate arrays (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It should also be appreciated that the memory in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example but not limitation, many forms of random access memory (random access memory, RAM) are available, such as Static RAM (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), synchronous Link DRAM (SLDRAM), and direct memory bus RAM (DR RAM).
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. In addition, the character "/" herein generally indicates that the associated object is an "or" relationship, but may also indicate an "and/or" relationship, and may be understood by referring to the context.
In the present application, "at least one" means one or more, and "a plurality" means two or more. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.
It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (30)

1. A method of obtaining a visual servoing model, comprising:
obtaining a first image feature dataset comprising a first desired image feature and a plurality of first initial image features;
acquiring a first speed data set, wherein the speed of the first speed data set is determined according to the camera pose corresponding to the first expected image feature and the camera pose corresponding to the plurality of first initial image features, and the speed of the first speed data set corresponds to the plurality of first initial image features one by one;
and training the neural network model based on the first image characteristic data set and the first speed data set to obtain a first visual servo model.
2. The method of claim 1, wherein the first speed dataset is a first camera speed dataset or a first robotic arm tip speed dataset;
Training the neural network model based on the first image feature data set and the first speed data set to obtain a first visual servo model, including:
and training the neural network model by taking the first image characteristic data set as input and the first speed data set as output to obtain the first visual servo model.
3. The method of claim 1, wherein the first speed dataset is a first robotic arm tip speed dataset;
training the neural network model based on the first image feature data set and the first speed data set to obtain a first visual servo model, including:
inputting the first image feature dataset into the neural network model, outputting a first predicted camera speed dataset;
converting the first predicted camera speed dataset into a first predicted manipulator end speed dataset according to a first initial hand-eye conversion function, the first initial hand-eye conversion function comprising a first initial camera outlier;
and adjusting the weight of the neural network model based on the first prediction mechanical arm tail end speed data set and the first speed data set to obtain the first visual servo model.
4. A method according to any one of claims 1 to 3, further comprising:
obtaining a second image feature dataset comprising a second desired image feature and a plurality of second initial image features;
acquiring a second speed data set, wherein the speed in the second speed data set is determined according to the camera pose corresponding to the second expected image feature and the camera pose corresponding to the plurality of second initial image features, and the speed in the second speed data set is in one-to-one correspondence with the plurality of second initial image features;
and updating the weight of the first visual servo model based on the second image characteristic data set and the second speed data set to obtain a second visual servo model.
5. The method of claim 4, wherein the output of the first visual servo model is a speed of a camera and the second speed dataset is a second robot tip speed dataset;
the updating the weight of the first visual servo model based on the second image characteristic data set and the second speed data set to obtain a second visual servo model comprises the following steps:
Inputting the second image feature dataset to the first visual servo model, outputting a second predicted camera speed dataset;
converting the second predicted camera speed dataset into a second predicted robotic arm tip speed dataset according to a second initial hand-eye transfer function, the second initial hand-eye transfer function comprising a second initial camera profile;
and updating the weight of the first visual servo model based on the second prediction mechanical arm tail end speed data set and the second speed data set to obtain the second visual servo model.
6. The method of claim 4, wherein the output of the first visual servoing model is a speed of a robot tip and the second speed dataset is a second robot tip speed dataset;
the updating the weight of the first visual servo model based on the second image characteristic data set and the second speed data set to obtain a second visual servo model comprises the following steps:
inputting the second image characteristic data set into the first visual servo model, and outputting a second predicted mechanical arm tail end speed data set;
and updating the weight of the first visual servo model based on the second prediction mechanical arm tail end speed data set and the second speed data set to obtain the second visual servo model.
7. A visual servoing method, comprising:
acquiring target data, wherein the target data comprises initial image characteristics and expected image characteristics;
inputting the target data into a first visual servo model or a second visual servo model, outputting a target speed, wherein the target speed is a camera speed or a tail end speed of a mechanical arm, the first visual servo model is obtained by training a neural network model based on a first image characteristic data set and a first speed data set, the first image characteristic data set comprises a first expected image characteristic and a plurality of first initial image characteristics, the speed of the first speed data set is determined according to a camera pose corresponding to the first expected image characteristic and a camera pose corresponding to the plurality of first initial image characteristics, the speed of the first speed data set is in one-to-one correspondence with the plurality of first initial image characteristics,
the second visual servo model is obtained based on a second image feature data set and a second speed data set, the second image feature data set comprises a second expected image feature and a plurality of second initial image features, the speed in the second speed data set is determined according to the camera pose corresponding to the second expected image feature and the camera pose corresponding to the plurality of second initial image features, and the speed in the second speed data set is in one-to-one correspondence with the plurality of second initial image features;
And controlling the robot to move according to the initial image feature, the expected image feature and the target speed.
8. The method of claim 7, wherein the first speed dataset is a first camera speed dataset or a first robotic arm tip speed dataset;
the first visual servoing model is obtained by training a neural network model based on a first image feature dataset and a first velocity dataset, comprising:
the first visual servo model is obtained by training the neural network model by taking the first image characteristic data set as input and the first speed data set as output.
9. The method of claim 7, wherein the first speed dataset is a first robotic arm tip speed dataset;
the first visual servoing model is obtained by training a neural network model based on a first image feature dataset and a first velocity dataset, comprising:
the first visual servo model inputs the first image characteristic data set to the neural network model and outputs a first predicted camera speed data set; converting the first predicted camera speed dataset into a first predicted manipulator end speed dataset according to a first initial hand-eye conversion function, the first initial hand-eye conversion function comprising a first initial camera outlier; and adjusting the weight of the neural network model based on the first predicted mechanical arm tail end speed data set and the first speed data set.
10. The method of claim 9, wherein if the target speed is a target camera speed, the method further comprises:
acquiring a first updated hand-eye transfer function, wherein the first updated hand-eye transfer function comprises a first updated camera external parameter, and the first updated camera external parameter is obtained after updating the first initial camera external parameter based on the first prediction mechanical arm tail end speed data set and the first speed data set;
the controlling the robot movement according to the initial image feature, the desired image feature, and the target speed comprises:
converting the target camera speed into a target mechanical arm tail end speed according to the first updated hand-eye conversion function;
and controlling the robot to move according to the initial image feature, the expected image feature and the tail end speed of the target mechanical arm.
11. The method of any one of claims 7 to 9, wherein the output of the first visual servo model is a speed of a camera and the second speed dataset is a second robot tip speed dataset;
the second visual-servo model is obtained based on a second image feature dataset and a second velocity dataset to update weights of the first visual-servo model, comprising:
The second visual servo model inputs the second image characteristic data set to the first visual servo model and outputs a second predicted camera speed data set; converting the second predicted camera speed dataset into a second predicted robotic arm tip speed dataset according to a second initial hand-eye transfer function, the second initial hand-eye transfer function comprising a second initial camera profile; updating the weight of the first visual servo model based on the second predicted mechanical arm tail end speed data set and the second speed data set.
12. The method of claim 11, wherein if the target speed is a target camera speed, the method further comprises:
acquiring a second updated hand-eye transfer function, wherein the second updated hand-eye transfer function comprises a second updated camera external parameter, and the second updated camera external parameter is obtained after updating the second initial camera external parameter based on the second predicted mechanical arm tail end speed data set and the second speed data set;
the controlling the robot movement according to the initial image feature, the desired image feature, and the target speed comprises:
converting the target camera speed into a target mechanical arm tail end speed according to the second updated hand-eye conversion function;
And controlling the robot to move according to the initial image feature, the expected image feature and the tail end speed of the target mechanical arm.
13. The method of claim 7 or 8, wherein the output of the first visual servo model is a speed of a robot tip and the second speed dataset is a second robot tip speed dataset;
the second visual-servo model is obtained based on a second image feature dataset and a second velocity dataset to update weights of the first visual-servo model, comprising:
the second visual servo model inputs the second image characteristic data set to the first visual servo model, and outputs a second predicted mechanical arm tail end speed data set; updating the weight of the first visual servo model based on a second predicted mechanical arm tail end speed data set and the second speed data set.
14. An apparatus for obtaining a visual servoing model, comprising:
an acquisition unit configured to acquire a first image feature data set including a first desired image feature and a plurality of first initial image features;
the acquiring unit is configured to acquire a first speed data set, where the speed of the first speed data set is determined according to the camera pose corresponding to the first expected image feature and the camera pose corresponding to the first initial image features, and the speed of the first speed data set corresponds to the first initial image features one to one;
And the processing unit is used for training the neural network model based on the first image characteristic data set and the first speed data set to obtain a first visual servo model.
15. The apparatus of claim 14, wherein the first speed dataset is a first camera speed dataset or a first robotic arm tip speed dataset;
the processing unit is used for:
and training the neural network model by taking the first image characteristic data set as input and the first speed data set as output to obtain the first visual servo model.
16. The apparatus of claim 14, wherein the first speed dataset is a first robotic arm tip speed dataset;
the processing unit is used for:
inputting the first image feature dataset into the neural network model, outputting a first predicted camera speed dataset;
converting the first predicted camera speed dataset into a first predicted manipulator end speed dataset according to a first initial hand-eye conversion function, the first initial hand-eye conversion function comprising a first initial camera outlier;
and adjusting the weight of the neural network model based on the first prediction mechanical arm tail end speed data set and the first speed data set to obtain the first visual servo model.
17. The apparatus according to any one of claims 14 to 16, further comprising:
the acquisition unit is used for acquiring a second image characteristic data set, and the second image characteristic data set comprises a second expected image characteristic and a plurality of second initial image characteristics;
the acquiring unit is configured to acquire a second speed data set, where the speed in the second speed data set is determined according to the camera pose corresponding to the second desired image feature and the camera pose corresponding to the plurality of second initial image features, and the speed in the second speed data set corresponds to the plurality of second initial image features one to one;
the processing unit is used for updating the weight of the first visual servo model based on the second image characteristic data set and the second speed data set to obtain a second visual servo model.
18. The apparatus of claim 17, wherein the output of the first visual servo model is a speed of a camera and the second speed dataset is a second robot tip speed dataset;
the processing unit is further configured to:
inputting the second image feature dataset to the first visual servo model, outputting a second predicted camera speed dataset;
Converting the second predicted camera speed dataset into a second predicted robotic arm tip speed dataset according to a second initial hand-eye transfer function, the second initial hand-eye transfer function comprising a second initial camera profile;
and updating the weight of the first visual servo model based on the second prediction mechanical arm tail end speed data set and the second speed data set to obtain the second visual servo model.
19. The apparatus of claim 17, wherein the output of the first visual servo model is a speed of a robot tip and the second speed dataset is a second robot tip speed dataset;
the processing unit is used for:
inputting the second image characteristic data set into the first visual servo model, and outputting a second predicted mechanical arm tail end speed data set;
and updating the weight of the first visual servo model based on the second prediction mechanical arm tail end speed data set and the second speed data set to obtain the second visual servo model.
20. A visual servoing device, comprising:
an acquisition unit configured to acquire target data including an initial image feature and a desired image feature;
A processing unit for inputting the target data into a first visual servo model or a second visual servo model, outputting a target speed, wherein the target speed is a camera speed or a tail end speed of the mechanical arm, the first visual servo model is obtained by training a neural network model based on a first image characteristic data set and a first speed data set, the first image characteristic data set comprises a first expected image characteristic and a plurality of first initial image characteristics, the speed of the first speed data set is determined according to a camera pose corresponding to the first expected image characteristic and a camera pose corresponding to the plurality of first initial image characteristics, the speed of the first speed data set is in one-to-one correspondence with the plurality of first initial image characteristics,
the second visual servo model is obtained based on a second image feature data set and a second speed data set, the second image feature data set comprises a second expected image feature and a plurality of second initial image features, the speed in the second speed data set is determined according to the camera pose corresponding to the second expected image feature and the camera pose corresponding to the plurality of second initial image features, and the speed in the second speed data set is in one-to-one correspondence with the plurality of second initial image features;
The processing unit is used for controlling the robot to move according to the initial image feature, the expected image feature and the target speed.
21. The apparatus of claim 20, wherein the first speed dataset is a first camera speed dataset or a first robotic arm tip speed dataset;
the first visual servoing model is obtained by training a neural network model based on a first image feature dataset and a first velocity dataset, comprising:
the first visual servo model is obtained by training the neural network model by taking the first image characteristic data set as input and the first speed data set as output.
22. The apparatus of claim 20, wherein the first speed dataset is a first robotic arm tip speed dataset;
the first visual servoing model is obtained by training a neural network model based on a first image feature dataset and a first velocity dataset, comprising:
the first visual servo model inputs the first image characteristic data set to the neural network model and outputs a first predicted camera speed data set; converting the first predicted camera speed dataset into a first predicted manipulator end speed dataset according to a first initial hand-eye conversion function, the first initial hand-eye conversion function comprising a first initial camera outlier; and adjusting the weight of the neural network model based on the first predicted mechanical arm tail end speed data set and the first speed data set.
23. The apparatus of claim 22, wherein if the target speed is a target camera speed;
the acquisition unit is used for:
acquiring a first updated hand-eye transfer function, wherein the first updated hand-eye transfer function comprises a first updated camera external parameter, and the first updated camera external parameter is obtained after updating the first initial camera external parameter based on the first prediction mechanical arm tail end speed data set and the first speed data set;
the processing unit is used for:
converting the target camera speed into a target mechanical arm tail end speed according to the first updated hand-eye conversion function;
and controlling the robot to move according to the initial image feature, the expected image feature and the tail end speed of the target mechanical arm.
24. The apparatus of any one of claims 20 to 22, wherein the output of the first visual servo model is a speed of a camera and the second speed dataset is a second robot tip speed dataset;
the second visual-servo model is obtained based on a second image feature dataset and a second velocity dataset to update weights of the first visual-servo model, comprising:
The second visual servo model inputs the second image characteristic data set to the first visual servo model and outputs a second predicted camera speed data set; converting the second predicted camera speed dataset into a second predicted robotic arm tip speed dataset according to a second initial hand-eye transfer function, the second initial hand-eye transfer function comprising a second initial camera profile; updating the weight of the first visual servo model based on the second predicted mechanical arm tail end speed data set and the second speed data set.
25. The apparatus of claim 24, wherein if the target speed is a target camera speed;
the acquisition unit is used for:
acquiring a second updated hand-eye transfer function, wherein the second updated hand-eye transfer function comprises a second updated camera external parameter, and the second updated camera external parameter is obtained after updating the second initial camera external parameter based on the second predicted mechanical arm tail end speed data set and the second speed data set;
the processing unit is used for:
converting the target camera speed into a target mechanical arm tail end speed according to the second updated hand-eye conversion function;
And controlling the robot to move according to the initial image feature, the expected image feature and the tail end speed of the target mechanical arm.
26. The apparatus of claim 20 or 21, wherein the output of the first visual servo model is a speed of a robot tip and the second speed dataset is a second robot tip speed dataset;
the second visual-servo model is obtained based on a second image feature dataset and a second velocity dataset to update weights of the first visual-servo model, comprising:
the second visual servo model inputs the second image characteristic data set to the first visual servo model, and outputs a second predicted mechanical arm tail end speed data set; updating the weight of the first visual servo model based on the second predicted mechanical arm tail end speed data set and the second speed data set.
27. An apparatus for obtaining a visual servoing model, comprising a processor and a memory, said memory for storing program instructions, said processor for invoking said program instructions to perform the method of any of claims 1 to 6.
28. A visual servoing device comprising a processor and a memory, said memory for storing program instructions, said processor for invoking said program instructions to perform the method of any of claims 7 to 13.
29. A computer readable storage medium storing program code for execution by a device, the program code comprising instructions for performing the method of any one of claims 1 to 6 or 7 to 13.
30. A chip comprising a processor and a data interface, the processor reading instructions stored on a memory via the data interface to perform the method of any one of claims 1 to 6 or 7 to 13.
CN202210524056.1A 2022-05-13 2022-05-13 Method for acquiring visual servo model, visual servo method and device Pending CN117095310A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210524056.1A CN117095310A (en) 2022-05-13 2022-05-13 Method for acquiring visual servo model, visual servo method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210524056.1A CN117095310A (en) 2022-05-13 2022-05-13 Method for acquiring visual servo model, visual servo method and device

Publications (1)

Publication Number Publication Date
CN117095310A true CN117095310A (en) 2023-11-21

Family

ID=88777673

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210524056.1A Pending CN117095310A (en) 2022-05-13 2022-05-13 Method for acquiring visual servo model, visual servo method and device

Country Status (1)

Country Link
CN (1) CN117095310A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118576977A (en) * 2024-05-22 2024-09-03 广州云近科技有限公司 A method and device for processing game operation data based on player device parameters

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118576977A (en) * 2024-05-22 2024-09-03 广州云近科技有限公司 A method and device for processing game operation data based on player device parameters

Similar Documents

Publication Publication Date Title
Long et al. Deep-learned collision avoidance policy for distributed multiagent navigation
Bilal et al. Development of a vision based pose estimation system for robotic machining and improving its accuracy using LSTM neural networks and sparse regression
Siradjuddin et al. Image-based visual servoing of a 7-DOF robot manipulator using an adaptive distributed fuzzy PD controller
CN111260649B (en) Close-range mechanical arm sensing and calibrating method
CN111508008B (en) Point cloud registration method, electronic equipment and storage medium
Cao et al. Dynamic range-only localization for multi-robot systems
CN106679672A (en) AGV (Automatic Guided Vehicle) location algorithm based on DBN (Dynamic Bayesian Network) and Kalman filtering algorithm
Zhou et al. Robust Kalman filtering with long short-term memory for image-based visual servo control
CN115480583A (en) Visual Servo Tracking and Impedance Control Method for Flying Robot
Fahimi et al. An alternative closed-loop vision-based control approach for Unmanned Aircraft Systems with application to a quadrotor
CN117095310A (en) Method for acquiring visual servo model, visual servo method and device
Zhou et al. Uncalibrated visual servoing based on Kalman filter and mixed-kernel online sequential extreme learning machine for robot manipulator
Teka et al. Learning based end effector tracking control of a mobile manipulator for performing tasks on an uneven terrain
Sulaiman et al. Dexterity analysis and intelligent trajectory planning of redundant dual arms for an upper body humanoid robot
CN111553954B (en) Online luminosity calibration method based on direct method monocular SLAM
CN116533237B (en) Fuzzy variable impedance control method integrating attitude measurement and one-dimensional force sensor information
Xu et al. Probabilistic Membrane Computing‐Based SLAM for Patrol UAVs in Coal Mines
Pajak et al. Planning of a point to point collision-free trajectory for mobile manipulators
CN115958595A (en) Robotic arm guiding method, device, computer equipment and storage medium
CN111504276B (en) Visual projection scale factor set-based joint target function multi-propeller attitude angle acquisition method
Wang et al. Action selection based on localizability for active global localization of mobile robots
Chen et al. Research on Door‐Opening Strategy Design of Mobile Manipulators Based on Visual Information and Azimuth
CN110543919A (en) A robot positioning control method, terminal equipment and storage medium
ES2988100T3 (en) Method and device for simulating a robotic system
CN115502965B (en) Robot control method, system, robot and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination