CN118404579A

CN118404579A - Robot control method, system, device, equipment and medium

Info

Publication number: CN118404579A
Application number: CN202410558752.3A
Authority: CN
Inventors: 黄晓庆; 马世奎; 张伟; 周明才; 付强; 肖羽佳
Original assignee: Cloudminds Shanghai Robotics Co Ltd
Current assignee: Cloudminds Shanghai Robotics Co Ltd
Priority date: 2024-05-07
Filing date: 2024-05-07
Publication date: 2024-07-30

Abstract

The invention relates to a robot control method, a system, a device, equipment and a medium. The method comprises the following steps: acquiring a task request provided by a user; decomposing the task request by using the large model to obtain a navigation subtask and an action subtask; the navigation subtask is used for guiding the robot to navigate to the target address of any target interaction object; the action subtask is to guide the robot to realize the limb action aiming at any target interaction object to achieve the task target; and sending the navigation subtask and the action subtask to the robot or sending the navigation parameter and the action parameter to the robot so that the robot can execute the navigation subtask and the action subtask in sequence according to the interactive execution sequence. The user only needs to provide task sentences, and the server side understands and decomposes the tasks of the user to the extent that the tasks can be understood and executed by the robot. A large model is deployed at the end of the server, and the server provides sufficient computing power, so that the hardware cost of the robot and the investment of computing power cost cannot be effectively reduced.

Description

Robot control method, system, device, equipment and medium

Technical Field

The invention relates to the technical field of robots and discloses a robot control method, a system, a device, equipment and a medium.

Background

With the popularization of robot technology, the level of intelligence of robots is becoming higher and higher. On robots, various sensors are typically configured, including cameras, radars, etc., to enable the working capacity of the robot to be increased.

In the prior art, although various sensors such as a camera and a radar are configured on the robot, due to the fact that the cost consideration on the robot is limited in hardware configuration such as a local processor, the data which can be processed are limited, the multi-mode large model is difficult to match, and the processing result cannot well meet the requirements of users. Even though some robots are capable of voice interaction with a user and performing corresponding tasks, they still rely largely on the user to provide canonical instructions, explicit tasks. Conversely, it is difficult for a user who does not have a certain technical ability to control the robot to perform a complex task. Therefore, a solution is needed that can improve the ability of a robot to perform complex tasks.

Disclosure of Invention

The invention aims to provide a robot control method, a system, a device, equipment and a medium, which are used for realizing a scheme of automatically decomposing and executing complex tasks by a robot.

In order to solve the above technical problems, in a first aspect, the present invention provides a robot control method, which specifically includes the following steps:

Acquiring a task request provided by a user;

Decomposing the task request by utilizing the pre-training large model to obtain a navigation subtask and an action subtask; the navigation subtask is used for guiding the robot to navigate to a target address of any target interaction object; the action subtask is to guide the robot to realize limb actions aiming at any target interaction object to achieve a task target;

and sending the navigation subtask and the action subtask to a robot, or sending navigation parameters corresponding to the navigation subtask and action parameters corresponding to the action subtask to the robot, so that the robot can execute the navigation subtask and the action subtask in sequence according to the interactive execution sequence.

Optionally, the obtaining the task request provided by the user includes:

Receiving the task request in a voice form or a text form provided by the user;

judging whether the task request contains complete subtask elements for guiding task splitting;

if the complete subtask element is not contained, determining a missing target interaction object and/or an interaction element aiming at the target interaction object;

transmitting interaction information to the user based on the absent target interaction object and/or interaction elements for the target interaction object;

And receiving feedback information which is provided by the user based on the interaction information and carries the target interaction object and/or the interaction element aiming at the target interaction object.

Optionally, the decomposing the task request to obtain a navigation subtask and an action subtask includes:

Determining target interaction objects contained in a flow for completing the task request and an interaction execution sequence of the robot and a plurality of the target interaction objects;

And generating the navigation subtask and the action subtask based on the target interaction object and the interaction execution sequence.

Optionally, the sending the navigation subtask and the action subtask to the robot includes:

Sequentially sending the navigation subtasks and the action subtasks to a robot according to an interactive execution sequence; or alternatively

And sending the navigation subtask, the action subtask and the interactive execution sequence information to a robot.

Optionally, the method further comprises:

when the navigation subtask and the action subtask are executed according to the interactive execution sequence, if the information of the task target which is not achieved by the robot feedback is received, determining a target interaction object corresponding to the task target which is not achieved;

Generating a search subtask aiming at the target interaction object based on the sensor information fed back by the robot and the unachieved task target;

And sending the searching subtask for searching the target interactive object by using a sensor to the robot.

Optionally, after the task of searching for the target interactive object using a sensor is sent to the robot, the method further includes:

If a search result fed back by the robot for searching the target interactive object is received, regenerating the navigation subtask or the action subtask corresponding to the unachieved task target;

and sending the regenerated navigation subtask or the action subtask to the robot so as to enable the robot to achieve a task.

Optionally, the sending the navigation parameter corresponding to the navigation subtask and the action parameter corresponding to the action subtask to the robot includes:

Generating navigation parameters from the current address of the robot to the target address according to the sensor information provided by the robot and the target address;

generating action parameters of the mechanical tail end according to the sensor information provided by the robot and the current tail end gesture of the mechanical tail end of the robot;

and sending the navigation parameters and/or the action parameters to the robot according to the interactive execution sequence.

In a second aspect, an embodiment of the present application proposes a robot control method, applied to a robot, the method including:

Receiving a navigation subtask and an action subtask provided by a server, or receiving navigation parameters corresponding to the navigation subtask and action parameters corresponding to the action subtask; the navigation subtask is used for guiding the robot to navigate to a target address of any target interaction object; the action subtask is to guide the robot to realize limb actions aiming at any target interaction object to achieve a task target;

sequentially executing the navigation subtask and the action subtask according to the interactive execution sequence;

And if the execution result is matched with the task request provided by the user, feeding back the end of task execution to the server.

Optionally, the sequentially executing the navigation subtask and the action subtask according to the interactive execution sequence includes:

judging whether the target interaction object is within the effective distance of the robot or not through a sensor;

if the effective distance is within the effective distance, executing the action subtasks, and executing the next action subtask or the navigation subtask according to the interactive execution sequence after the corresponding task target is achieved;

And if the action subtask is not within the effective distance, executing the navigation subtask, and executing the next action subtask or the navigation subtask according to the interactive execution sequence after the corresponding task target is achieved.

Optionally, if the action subtask is within the effective distance, performing the action subtask includes:

Determining a first relative positional relationship of the target interactive object and a mechanical end of the robot based on a sensor;

determining an action parameter of the mechanical end based on the first relative positional relationship and a current end pose of the mechanical end;

and executing the action subtask based on the action parameter.

Optionally, if the navigation sub-task is not within the effective distance, performing the navigation sub-task includes:

determining a second relative positional relationship between the robot's body position and a target address of the target interactive object based on the sensor;

and executing the navigation subtasks based on the second relative position relation so that the target interaction element is within the effective distance of the robot after the robot moves.

Optionally, the determining, by the sensor, whether the target interactive object is within the effective distance of the robot includes:

judging whether the target interaction object is in the visual range of the robot or not through a visual sensor;

If so, determining a first distance based on a second relative positional relationship between the body position of the robot and the target address of the target interactive object;

determining a second distance of the limit length that the mechanical tip can contact based on the body position of the robot;

If the first distance is smaller than the second distance, the target interaction object is within the effective distance of the robot;

And if the first distance is not smaller than the second distance, the target interaction object is not in the effective distance of the robot.

Optionally, the method further comprises: when the navigation subtask and the action subtask are executed according to the interactive execution sequence, if the task target is not achieved, feeding back the information of the task target which is not achieved and a corresponding target interaction object to the server;

receiving a searching subtask provided by the server; the generation mode of the searching subtasks comprises the following steps: and the server generates a searching subtask aiming at the target interaction object by utilizing the sensor information fed back by the robot and the target of the unrealized task.

Optionally, after receiving the search subtask provided by the server, the method further includes: and if the target interaction object is searched, re-executing the navigation subtask or the action subtask corresponding to the unachieved task target so as to ensure that the robot achieves the task.

In a third aspect, an embodiment of the present application proposes a robot control system, the system comprising:

The method comprises the steps that a server obtains a task request provided by a user by utilizing a first system layer, decomposes the task request by utilizing a pre-training large model, generates a navigation subtask and an action subtask which can be executed by a robot by utilizing a second system layer based on decomposition results, and sends the navigation subtask and the action subtask to the robot or sends navigation parameters corresponding to the navigation subtask and action parameters corresponding to the action subtask to the robot;

The third system layer in the robot sequentially executes the navigation subtasks and the action subtasks according to the received interaction execution sequence; or sequentially executing according to the interaction execution sequence according to the navigation parameters corresponding to the navigation subtasks and the action parameters corresponding to the action subtasks; the navigation subtask is used for guiding the robot to navigate to a target address of any target interaction object; the action subtask is to guide the robot to realize limb actions aiming at any target interaction object to achieve a task target.

Optionally, if the system comprises a plurality of robots; the server sends the navigation subtask and the action subtask to a plurality of robots according to the interactive execution sequence;

And after any one of the robots completes the navigation subtask or the action subtask, sending an execution completion subtask result to the server so that the server informs the next robot to execute the assigned navigation subtask and/or action task according to the interactive execution sequence.

In a fourth aspect, an embodiment of the present application proposes a robot control device applied to a server, the device including:

the acquisition module is used for acquiring a task request provided by a user;

The decomposition module is used for decomposing the task request by utilizing the pre-training large model to obtain a navigation subtask and an action subtask; the navigation subtask is used for guiding the robot to navigate to a target address of any target interaction object; the action subtask is to guide the robot to realize limb actions aiming at any target interaction object to achieve a task target;

And the sending module is used for sending the navigation subtask and the action subtask to the robot or sending the navigation parameter corresponding to the navigation subtask and the action parameter corresponding to the action subtask to the robot so that the robot can execute the navigation subtask and the action subtask in sequence according to the interactive execution sequence.

In a fifth aspect, an embodiment of the present application proposes a robot control device applied to a robot, the device comprising:

The receiving module is used for receiving the navigation subtask and the action subtask provided by the server or receiving the navigation parameter corresponding to the navigation subtask and the action parameter corresponding to the action subtask; the navigation subtask is used for guiding the robot to navigate to a target address of any target interaction object; the action subtask is to guide the robot to realize limb actions aiming at any target interaction object to achieve a task target;

the execution module is used for sequentially executing the navigation subtask and the action subtask according to the interactive execution sequence;

and the feedback module is used for feeding back the end of task execution to the server side if the execution result is matched with the task request provided by the user.

In a sixth aspect, an embodiment of the present application proposes an electronic device, including: a memory and a processor; wherein,

The memory is used for storing programs;

The processor is coupled to the memory for executing the program stored in the memory for implementing the method of the first or second aspect.

In a seventh aspect, an embodiment of the present application proposes a computer readable storage medium, characterized in that the computer readable storage medium, when executed, is capable of implementing the steps in the method according to the first or second aspect.

In the embodiment of the application, a task request provided by a user is acquired; decomposing the task request by utilizing the pre-training large model to obtain a navigation subtask and an action subtask; the navigation subtask is used for guiding the robot to navigate to a target address of any target interaction object; the action subtask is to guide the robot to realize limb actions aiming at any target interaction object to achieve a task target; and sending the navigation subtask and the action subtask to a robot so that the robot can execute the navigation subtask and the action subtask in sequence according to the interactive execution sequence. Through the scheme, the user only needs to provide simple task sentences, and then the server side understands and decomposes the tasks of the user to the extent that the tasks can be understood and executed by the robot, so that the user does not need to perform more interactions or provide more prompt words, and the user does not need to decompose and command the robot. The server is provided with a large model obtained by pre-training, and the server can provide sufficient calculation force, so that the robot body is not required to carry out too much calculation, the local calculation force requirement is reduced, and the hardware cost of the robot and the input of calculation force cost can be effectively reduced. Meanwhile, the large model in the server can be subjected to adaptive fine adjustment according to application requirements of different scenes, so that the application requirements in various scenes can be met, and the whole robot control system has better universality. Specifically, according to the task request content and the working capacity of the robot, the task request provided by the user is decomposed, so that a navigation subtask and an action subtask which are applicable to the corresponding robot are obtained, and the navigation subtask and the action subtask can be directly sent to the robot; the server can execute part of navigation subtasks and part of action subtasks, output navigation parameters and action parameters, and then send the navigation parameters and the action parameters to the robot. The task execution processing capacity and the accurate effect can be effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description will be given below of the drawings required for the embodiments or the prior art descriptions, and it is obvious that the drawings in the following description are some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a robot control system according to an embodiment of the present application;

Fig. 2a is a schematic diagram of a control architecture of a robot control system according to an embodiment of the present application;

Fig. 2b is a schematic diagram of an execution flow of a robot control system according to an embodiment of the present application;

fig. 3 is a schematic flow chart of a robot control method according to an embodiment of the present application;

FIG. 4 is a schematic illustration of another method for controlling a robot according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a robot control device according to an embodiment of the present application;

Fig. 6 is a schematic structural diagram of another robot control device according to an embodiment of the present application;

Fig. 7 is a schematic structural diagram of an electronic device corresponding to the robot control device provided in the embodiment shown in fig. 5 and 6.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of the embodiments of the present application will be given with reference to the accompanying drawings. However, those of ordinary skill in the art will understand that in various embodiments of the present application, numerous technical details have been set forth in order to provide a better understanding of the present application. The claimed application may be practiced without these specific details and with various changes and modifications based on the following embodiments.

Unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, it is the meaning of "including but not limited to".

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, the "plurality" generally includes at least two.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.

In the prior art, a radar (e.g., a laser radar, an infrared radar, an ultrasonic radar) and a vision sensor (e.g., a camera) are provided on a robot. The robot further includes an inertial measurement unit and the like. On the basis that the robot is provided with the hardware, the robot is also provided with path planning and navigation capabilities. But due to the limited ability of robots to understand human language, robots are only able to perform simple and well-defined tasks. And the robot cannot understand and execute complex tasks. Often requiring the assistance of a professional. In some cases, such robots cannot meet the intelligent needs of users. Accordingly, the present application proposes a robot control system and a robot control method capable of performing complex tasks.

Fig. 1 is a schematic diagram of a robot control system according to an embodiment of the present application. As can be seen from fig. 1, in the robot control system, a server and at least one robot controlled by the server are included. The servers can be cloud servers or local servers, and the servers have strong calculation power and can allow a large language model. And, can realize real-time communication between server and the robot.

Specifically, the robot control system includes: the server 11 obtains a task request provided by a user by using a first system layer, decomposes the task request by using a pre-training large model, generates a navigation subtask and an action subtask which can be executed by the robot based on a decomposition result by using a second system layer, and sends the navigation subtask and the action subtask to the robot 12, or sends navigation parameters corresponding to the navigation subtask and action parameters corresponding to the action subtask to the robot 12.

A third system layer in the robot 12 sequentially executes the navigation subtasks and the action subtasks according to the received interaction execution sequence; or sequentially executing according to the interaction execution sequence according to the navigation parameters corresponding to the navigation subtasks and the action parameters corresponding to the action subtasks; the navigation subtask is used for guiding the robot to navigate to a target address of any target interaction object; the action subtask is to guide the robot to realize limb actions aiming at any target interaction object to achieve a task target.

For easy understanding, the following illustrates a control flow of the robot control system, and fig. 2a is a schematic diagram of a control architecture of the robot control system according to an embodiment of the present application.

The control system in the scheme of the application has a three-layer architecture with strong universality: the upper layer (i.e. the first system layer) at the server side is mainly responsible for the related task of strategy planning, the middle layer (i.e. the second system layer) is mainly responsible for the related task of tactical planning, and the robot is mainly responsible for the three-layer control architecture of closed-loop control of the lower layer (i.e. the third system layer). The task planning and decomposing capability of the server side and the bottom execution capability of the robot are decoupled, so that the task decomposition from a user to a cloud server is realized, and then the full-flow task control of the target task is actually completed by the robot, wherein the full-flow task control comprises a series of continuous operations of splitting subtasks, defining subtask targets, actually executing subtasks and the like, and the full-flow task control method is applicable to various robot control tasks with chassis control and joint control capabilities.

In the scheme of the application, a cloud cooperation scheme with both efficiency and precision is adopted, so that the server can be a server deployed on the cloud side, and the end refers to a robot capable of performing data interaction with the cloud server. When controlling the robot, the related tasks of a high-precision large language model (Large Language Model, LLM) with higher computational power requirements are deployed at the cloud, so that the problems of splitting subtasks and defining the strategy and tactics of subtask targets with lower decision-making frequency are solved; the model with smaller calculation force consumption for actually controlling the action of the robot is deployed at the end side of the robot, so that the problem of 'execution' of actual tasks such as actual robot joint angle and speed control is solved; therefore, the execution efficiency is ensured, the calculation force requirement on the end side of the robot is reduced, and the hardware cost of the robot is reduced.

As shown in fig. 2a, the upper layer (i.e., the first system layer) is mainly responsible for "strategic planning" for making task requests to users, splitting the total task made by users into multiple subtasks, and as an alternative, using a visual language big model for task decomposition. In use, the visual language big model integrates multi-modal information reasoning, the input of the model is task request in language or text form of a user, current robot sensor information (RGBD camera, radar, force sensor and the like), and the output is in language form or text form. Specifically, if the task request of the user is ambiguous, the visual language big model will continue to output questions and the user will conduct multiple rounds of conversations to obtain an explicit total task from the user and decompose the total task to obtain subtasks. If the task request of the user is clear, the visual language big model directly outputs the subtasks decomposed from the total tasks. The visual language big model can be obtained by using prompt engineering (prompt engineering) for the current open-source language big model, and the robot skills of the bottom layer are integrated into the better effect which can be obtained by the promt. Specifically, prompt engineering (Prompt Engineering) refers to directing a model to generate a desired output by designing an appropriate prompt text (prompt) when performing tasks using a language big model. The prompt text, which may be, for example, a sentence, a question, or a complete instruction, provides the model with background information of the task and the required behavioral guidance so that the model better understands the task requirements and generates the corresponding results. In a visual language big model, the underlying skills of the robot can be incorporated by using hint engineering. By designing the appropriate cues, the model can generate the appropriate action sequences according to the images and the specific language instructions, thereby realizing accurate navigation or action control. The design of the prompt engineering needs to combine specific task and model characteristics. One common strategy is to include task related keywords or instructions in the prompt text to explicitly express the desired task requirements. In addition, the degree of detail of the prompt may be selected according to the complexity of the task and the expressive power of the model. A more concise prompt may be suitable for a simple task, while a more detailed and instructional prompt may be required for a more complex task. Through the clever design of the prompt engineering, the language model can be helped to better understand and execute tasks, and the performance and the output quality of the model are improved. This makes the visual language model more practical and applicable in the fields of robot navigation and motion control.

As shown in fig. 2a, the middle layer (i.e., the second system layer) acts as a "tactical planning" primarily for defining subtask targets, and this layer primarily includes two large models: visual language navigation big models (VLNs) are used for navigation tasks, respectively, and visual language action big models (VLAs) are used for robot joint control.

The visual language navigation large model (VLN) is a machine learning model, and aims to realize the combination of visual perception and natural language understanding so as to realize navigation tasks. The goal of the model is to enable the robot to accurately navigate to the target location based on the input visual images and natural language instructions (e.g., "walk forward to the side of the table"). The input of the visual language navigation large model (VLN) is related information such as sub-tasks (including navigation sub-tasks and action sub-tasks) obtained by decomposition provided by an upper layer (namely a first system layer), position information and visual information acquired by a current sensor (radar, camera and the like) of the robot, visual characteristics are extracted from an input image of the sensor through the visual language navigation large model (VLN), the input natural language provided by a user is preprocessed and extracted, finally control information which can be understood and executed by a bottom layer (namely a third system layer) of the robot is output, and the position to be moved by the robot is determined based on the control information and the relative distance of a target position is given, wherein the control information comprises the following steps: the task objective of the navigation subtask is to require the robot to move to the front of the table, and the model outputs control information which can be executed by the robot, such as moving forward by 0.5m and moving leftwards by 0.4m (of course, if the robot has autonomous navigation capability, the server can also plan the navigation route by itself after only one navigation subtask is issued). It should be noted that, in order to better meet the navigation requirement of the robot, a semantic map may be maintained in the large visual language navigation model, and specific position information of some objects found in the past in the current space may be stored. For example, in a strange scene, the visual language navigation large model VLN can continuously explore the environment to perfect the semantic map, and then find the target interactive object. The visual language navigation large model VLN can be used for obtaining a model which meets the requirements of actual application scenes through open-source open-domain detection and segmentation large model fine adjustment training.

The visual language action large model (VLA) is a model for robot joint control. It uses visual perception and natural language understanding to parse human language input and translate it into joint control commands for the robot, thereby achieving fine motion control. The visual language action large model (VLA) model inputs related information such as position information and visual information acquired by a current sensor (radar, camera and the like) of the robot, wherein the related information is obtained by decomposing a subtask (comprising a navigation subtask and an action subtask) provided for an upper layer (namely a first system layer), the current terminal gesture represents the behavior state of a mechanical terminal of a robot body, and control information for realizing a target gesture of the mechanical terminal is output. Such as: when the task target is an apple grabbing, the manipulator can directly launch a subtask for the robot if the robot has the identifying and grabbing capabilities, and then the robot plans the action according to the identifying result. The model can be trained through fine tuning of an open-source VLA model, so that the model which meets the requirements of actual application scenes better is obtained. It should be noted that, specific control information of different types of mechanical terminals is different, for example, the mechanical terminals may be in various forms such as arms, suckers, etc., and the number of axes of different mechanical arms is also different. The control information output by the visual language action large model VLA is merely used as an example, and does not limit the technical scheme of the present application.

For the navigation subtask and the target subtask after the decomposition of the upper layer, the middle layer judges whether the target interaction object is in the visual field and the operable range (namely, the effective distance) according to the sensor information provided by the robot, if not, the model VLN is needed to be used for navigation in space, namely, a navigation subtask for searching the target interaction object is executed, the target interaction object is searched, and the target interaction object is moved to a proper distance from the target interaction object (namely, moved to the effective distance range); if there is already a target interactive object in the field of view and within the operational range (effective distance), VLA is used for motion planning other than chassis.

For robot control, the output of the middle-layer task, i.e., the task target of the subtask, can also be understood as the target address or end pose. For example, for a navigation subtask, the middle layer may give the target address (i.e., coordinates) (x, y, theta (orientation)); for the action subtask, the middle layer will output the 6D pose (x, y, z, roll, pitch. Yaw) of the end pose. It should be noted that, the content of the end pose output for different types of robots is different, for example, for the control of a complex structure (such as a bipedal/quadruped robot), the middle layer will output the pose of the whole body joint. This generic end pose will be advantageous for the underlying layer to use a generic model to accomplish the above tasks, i.e., the underlying controller is primarily aimed at correcting and tracking these reference trajectories, ensuring motion balance, smoothness and collision avoidance. The middle layer is responsible for decomposing the subtasks into target addresses and terminal gestures, so that the server can be better adapted to different application scenes in a fine adjustment mode through the large model, different application requirements are met, and the robot does not need to be retrained or greatly modified and adjusted. The robot control system has better universal effect.

As shown in fig. 2a, the "closed-loop control" at the bottom layer (i.e. the third system layer) of the robot end is mainly used for controlling the movement of the robot in real time according to the task targets and the sensor information of the robot, which are respectively corresponding to the navigation subtasks and the action subtasks output from the middle layer. When the robot is controlled, a closed-loop control strategy (chassis movement obstacle avoidance controller and joint movement controller) can be obtained by a reinforcement learning training mode based on manual teaching data to treat movement and obstacle avoidance problems, and a safety daemon can smoothly act and scram to ensure safety. The closed-loop control strategy is obtained by reinforcement learning training based on manual teaching data, and is input into data of all sensors (radar, vision, force, speed and the like) of a robot body and data of a target address, a tail end gesture and the like corresponding to a task target output by a middle layer, and the data are output into angular speed and linear speed (chassis control) or joint rotation angle (joint control), so that obstacle avoidance operation in the actual navigation process and the action process is completed.

In addition, as shown in fig. 2a, the system comprises a cloud server end and a robot end. The cloud server has sufficient calculation force, and can be used for deploying large models of upper layers and middle layers with large volume, so that the cloud server is used for executing complicated calculation tasks. Moreover, the cloud server side outputs the decision-making frequency to the robot to be low, even if the calculation result with small data quantity is output, the cloud server side and the robot side can not have great influence on the overall result of the task executed by the robot when transmitting the calculation result and other limited data information. The robot side is used for deploying bottom control models, the models are small in volume and small in calculation force requirement, excessive hardware requirements are avoided, the models can be easily deployed at the robot end, the models actually control the movement of the robot, the decision frequency is high, high-frequency sensor information of the robot end is needed, and the models are directly deployed at the robot end, so that the data transmission time is saved, and the overall operation efficiency is remarkably improved.

The large language model (Large Language Model, LLM) referred to herein refers to a language model having a huge number of parameters and a high degree of expressive power. Large language models are often used to handle natural language processing (Natural Language Processing, NLP) tasks such as text generation, machine translation, question answering, human-machine interaction, and the like.

The large language model generates text with certain semantic accuracy by learning language patterns and statistical rules in a large-scale text dataset. By training with deep neural networks and large data sets, with a multi-layered hidden cell structure, a very large input space can be handled. Although the large language model runs on a server, to improve the flexible effect of the large language model application, training is usually performed by using a pre-training and fine-tuning method. In the pre-training phase, the model learns language knowledge and context understanding capabilities from large-scale text data through unsupervised learning. Common pre-training methods include language modeling (Language Modeling) and mask language modeling (Masked Language Modeling). In the fine tuning stage, the model uses a specific task data set to conduct supervised training on the model to adapt to specific task requirements. The large language model after fine tuning training can better meet application requirements of different scenes, can generate high-quality and coherent texts, can better understand the context and semantic relation of input texts of various specific scenes, and can generate output more conforming to different scenes.

As an alternative, if the system comprises a plurality of robots; the server sends the navigation subtask and the action subtask to a plurality of robots according to the interactive execution sequence; and after any one of the robots completes the navigation subtask or the action subtask, sending an execution completion subtask result to the server so that the server informs the next robot to execute the assigned navigation subtask and/or action task according to the interactive execution sequence.

In practical application, if a plurality of robots are in the robot control system, a server is used as a coordination and dispatch center to perform task allocation on the plurality of robots. That is, after receiving the task request from the user, the server distributes the sub-tasks obtained by decomposition to the corresponding robots according to the working capacity and working state of each robot, and the plurality of robots cooperate with each other to complete the task arranged by the user.

Multiple robots also need to cooperate with each other in performing tasks. For example, to improve the searching efficiency, two robots may be arranged to simultaneously perform the searching task (i.e., the navigation subtask described above) for the same target interactive object. The two robots share the searching information through the server, so that the ensured searching range is not repeated, and after one robot searches the target interactive object, the other robot is informed to terminate the searching in time. The server is notified of the results of the robot achieving the task goal so that the server notifies the robot performing the next subtask that execution of the scheduled subtask can begin.

For ease of understanding, the task execution flow of the robot control system will be illustrated by way of example. Fig. 2b is a schematic diagram of an execution flow of a robot control system according to an embodiment of the present application.

After receiving the task request, the overall task is split into a plurality of subtasks using an upload split policy. And executing the corresponding subtasks by the server or the robot, wherein the navigation subtasks are executed by utilizing the VLN model, and the action subtasks are executed by utilizing the VLA model. After the robot receives the navigation parameters and/or the motion parameters, it is further determined whether the task objective is achieved. If so, ending the current subtask, further judging whether all the subtasks are finished, and if so, ending. If not, executing the next subtask according to the interactive execution sequence. In the process of executing the subtasks, the bottom layer of the robot controls the robot to move according to the navigation parameters and the action parameters provided by the middle layer.

Through the scheme, the user only needs to provide simple task sentences, and then the server side understands and decomposes the tasks of the user to the extent that the tasks can be understood and executed by the robot, so that the user does not need to perform more interactions or provide more prompt words, and the user does not need to decompose and command the robot. The server is provided with a large model obtained by pre-training, and the server can provide sufficient calculation force, so that the robot body is not required to carry out too much calculation, the local calculation force requirement is reduced, and the hardware cost of the robot and the input of calculation force cost can be effectively reduced. Meanwhile, the large model in the server can be subjected to adaptive fine adjustment according to application requirements of different scenes, so that the application requirements in various scenes can be met, and the whole robot control system has better universality. Specifically, according to the task request content and the working capacity of the robot, the task request provided by the user is decomposed, so that a navigation subtask and an action subtask which are applicable to the corresponding robot are obtained, and the navigation subtask and the action subtask can be directly sent to the robot; the server can execute part of navigation subtasks and part of action subtasks, output navigation parameters and action parameters, and then send the navigation parameters and the action parameters to the robot. The task execution processing capacity and the accurate effect can be effectively improved.

Fig. 3 is a schematic flow chart of a robot control method according to an embodiment of the present application. The method is applied to a server, which may be a local server or a cloud server. As can be seen from fig. 3, the method comprises the following steps:

Step 301: and acquiring a task request provided by a user.

Step 302: decomposing the task request by utilizing the pre-training large model to obtain a navigation subtask and an action subtask; the navigation subtask is used for guiding the robot to navigate to a target address of any target interaction object; the action subtask is to guide the robot to realize limb actions aiming at any target interaction object to achieve a task target.

Step 303: and sending the navigation subtask and the action subtask to a robot, or sending navigation parameters corresponding to the navigation subtask and action parameters corresponding to the action subtask to the robot, so that the robot can execute the navigation subtask and the action subtask in sequence according to the interactive execution sequence.

In practical applications, there are various ways in which a user may provide a task request. For example, the user may directly make a task request to the robot in the form of speech or text (e.g., help me make coffee), and the task request is sent by the robot to the server. For another example, the user may send a task request in voice or text form directly to the server via the user terminal (e.g., the user sends a voice "help me make coffee", or the user places an order for a single cup of coffee via terminal APP).

The pre-trained large model is the Large Language Model (LLM) described above, and reference is specifically made to the foregoing, and the detailed description is not repeated here. It should be noted that, in order to make the large model have more accurate task decomposition capability, targeted fine adjustment needs to be performed for different application scenarios, for example, when the robot is applied to a kitchen cooking scenario, targeted training needs to be performed by using a cooking recipe.

The term "target interactive object" as used herein is understood to mean an object with which it is desired to control the robot to interact. The interaction may take many forms, such as touching, grabbing, dropping, pulling, pushing, squeezing, pressing, rotating, striking, etc., may be performed by a robotic arm or other robotic component (e.g., magnetic attraction component, water spray component, fire spray component, electric shock component, etc.), which are not illustrated herein.

The navigation subtask is a subtask for guiding the robot to navigate to the target address of any target interaction object. When a robot performs a complex overall task, various target interaction objects in the environment may be used to achieve the task goal, such as coffee making, coffee beans (one target interaction object), water (another target interaction object), cups (yet another target interaction object), and so on. These different target interaction objects all have their own target addresses. Some target interaction objects are directly accessible to the robot arm, and some are not in the robot arm coverage area, so that the robot needs to move the body position to the target address to be accessible. Therefore, the server is required to decompose to obtain a navigation subtask, the navigation subtask is issued to the robot, and the robot executes the navigation subtask; or the control information such as the target address and the movement parameter which are obtained based on the decomposition of the navigation subtask is issued to the robot, and the robot reaches the target address after executing the corresponding control information.

The navigation parameters corresponding to the navigation subtasks may be parameters that can be executed by moving parts (such as wheels, bionic legs, tracks, etc.) in the robot, and the robot can reach the target address after executing the movement according to the navigation parameters. For example, the navigation parameter may include at least one of a movement speed, a movement direction, and a movement distance.

The action subtask is a limb action for guiding the robot to achieve the task target aiming at any target interaction object. When the target interaction object is within the effective distance of the robot, the mechanical end of the mechanical arm (mechanical part) of the robot can interact with the target interaction object normally, such as ending a water cup. When generating the action subtask, the corresponding interaction element is determined by fully considering the characteristics of the target interaction object and the task content. In other words, the interaction elements for different target interaction objects are different. For example, the interactive element of the end-up water cup is used for controlling the mechanical end to clamp the water cup and lift the water cup, the interactive element of the apple cutting is used for controlling the mechanical end to clamp the apple and rotate the apple or the cutter, and the interactive element of the sugar adding in the water is used for cutting the sugar packaging bag and then turning downwards.

When the task request is decomposed, the total task achievement flow specified by the task request (the flow comprises target interaction objects required for achieving each step of the task) and the achievement capacity of the robot are decomposed. When task decomposition is performed, the large model performs preprocessing and feature extraction on the input natural language instruction. This typically includes word segmentation, part-of-speech tagging, and word vector representation so that the model can understand the meaning of the instruction. For example, a large model is required to understand the meaning of "I want a cup of iced coffee" by a user, including target interaction objects such as coffee cups, water, ice, etc., and the flow of making coffee (i.e., the order of execution of interactions and interaction relationships by a robot using these target interaction objects).

Further, the decomposed navigation subtasks and action subtasks may be transmitted to the robot, which requires the robot to have subtask understanding and execution capabilities. For example, the navigation subtask is to navigate to a target address of the coffee machine.

As an alternative, in order to reduce the computational burden of the robot, the robot may not be endowed with understanding capability, that is, a high computational power chip does not need to be configured, thereby reducing the cost of the robot. The server further executes and decomposes the navigation subtask and the action subtask, and outputs navigation parameters corresponding to the navigation subtask and action parameters corresponding to the action subtask. The specific implementation of the decomposition method will be described in the following embodiments, and the detailed description will not be repeated here.

After receiving the navigation subtask and the action subtask or the navigation parameter corresponding to the navigation subtask and the action parameter corresponding to the action subtask, the robot further executes according to the interactive execution sequence. It should be noted that, the interactive execution sequence of different tasks (for example, coffee making and egg frying) is different, and the interactive execution sequence of robots with different achievement capacities is different.

Through the scheme, the server reasonably decomposes the task request, and the robot executes the decomposed subtasks without high calculation force. The calculation load of the robot is reduced, and the cost of the robot is reduced. Meanwhile, the server end is provided with a large model capable of accurately understanding the task request of the user and accurately splitting the task, and task processing capacity and processing efficiency can be effectively improved.

In one or more embodiments of the present application, the obtaining a task request provided by a user includes:

Receiving the task request in a voice form or a text form provided by the user;

The subtask element may be a plurality of target interaction objects required for achieving the task target, or an interaction element for the target interaction objects. For example, the general task is for the robot to provide a cup of latte coffee to the customer. If the customer does not say that one of the task elements is sugar, the task is not able to disassemble the subtask to obtain sugar, so that the customer needs to communicate with the customer again to inquire whether the customer needs to add sugar. The interactive element may also be a state for the target interactive object, such as softness, heat, height, weight, size, tightness, etc. of the target interactive object.

After receiving the task request, the server first determines whether the task request is clear and complete, and whether the task request includes a task that can be successfully executed and enables the task achievement effect to meet the user expectation. The necessary subtask elements in different task requests are different. If the task request provided by the user contains complete all sub-task elements, the task decomposition can be directly performed.

If a task request provided by a user lacks one or more subtask elements, then the missing subtask elements need to be completed. These subtask elements include target interactive objects (e.g., water, kiwi) and/or interactive elements directed to target interactive objects (e.g., water cooling, kiwi size). And these subtask elements are required to be provided by the user. Thus, after determining the missing target interactive object and/or interactive element, interactive information may be further generated and sent to the user in text or voice form. The mobile terminal can be played or displayed to a user through the robot, and can also be sent to terminal equipment (such as a mobile phone and a computer) of the user. Generally, the interactive information is sent to the device corresponding to the task request provided by the user, in other words, the manner in which the user sends the task request is the same as the manner in which the interactive information is received, but the data transmission direction is opposite.

After the interactive information is sent to the user, feedback information of the user is received. Of course, the feedback information may include all the missing subtask elements, so that no further interaction is required. If the feedback information does not contain all the missing subtask elements, further confirmation is needed, which subtask elements are missing, and the next interaction information is generated again and sent to the user until the user provides all the missing subtask elements.

The server may interact with the user and ensure that complete task elements are obtained in order to properly break down tasks and provide services that meet the user's expectations. The interactive process can improve the accuracy of the dialogue and ensure that the user requirements are met.

In one or more embodiments of the present application, the decomposing the task request to obtain a navigation subtask and an action subtask includes: determining target interaction objects contained in a flow for completing the task request and an interaction execution sequence of the robot and a plurality of the target interaction objects; and generating the navigation subtask and the action subtask based on the target interaction object and the interaction execution sequence.

After receiving the task request and judging the integrity of the task request, the server needs to determine target interaction objects involved in the task execution process, such as kiwi fruits which the user wants to eat sweet, wherein the target interaction objects, such as kiwi fruits, cutters, sugar and the like, are involved in the task. These target interactions are the main reference basis for task decomposition. After determining the target interactive objects, the robot needs to determine the execution order of interactions with these target interactive objects. For example, if the task request involves cutting kiwi fruit and preparing other materials, the interactive execution sequence may be to process the steps of cutting kiwi fruit before performing other preparation tasks (e.g., opening a sugar bag) to ensure that the task proceeds smoothly as desired by the user. Based on the determined target interaction object and the interaction execution sequence, the robot may generate a navigation subtask to instruct the robot to move to a target address where the target interaction object is located in the process of executing the task. For example, if the step of cutting kiwi fruit needs to be done in the kitchen and the step of preparing for sugar takes place in the restaurant, the robot may generate a navigation subtask directing it from the restaurant to the target address of the kitchen for sugar. In addition to the navigation subtasks, the robot also needs to generate corresponding action subtasks to perform the actual operations. In the step of cutting the kiwi fruit, the action subtask may include operations of taking a cutter, cutting the kiwi fruit into pieces, and the like; in other preparation steps, the action subtask may include operations to obtain the desired food material, prepare the implement, and the like. The generation of these action subtasks may be based on a large model, rules, or a predefined instruction set. Through the scheme, the server can generate corresponding navigation subtasks and action subtasks according to the target interaction objects and the interaction execution sequence in the task request so as to realize the execution of the task. This task disassembly and subtask generation process ensures that the robot performs tasks as desired by the user and provides services that meet the user's expectations.

When task decomposition is performed, the task decomposition can be performed based on the target interaction object by adopting the scheme. In some cases, the task achievement capacity of the robot or the functions supported by the robot are also considered. For example, the first robot A1 has only a mechanical end capable of performing various operations, and the second robot A2 has a container for heating water (the container may be used for heating various articles such as tea leaves and coffee). When the subtasks of coffee heating are also required to be performed, the subtasks obtained by decomposition for the first robot A1 include: starting the water cup, receiving water from the water cup, pouring water into the coffee machine, returning the water cup, and starting a heating button of the coffee machine; for the second robot A2, the sub-tasks obtained by decomposition include: the container is used for receiving water, and the self-heating function is started, so that the water cup taking and placing actions and the water pouring actions are saved. Obviously, the same overall task is different for sub-tasks decomposed by different robots. Thus, the interaction execution sequence decomposes the total task to generate the navigation subtask and the action subtask applicable to the robot based on the target interaction object, the achievement capacity of the robot or the supported function.

In one or more embodiments of the present application, the sending the navigation subtask and the action subtask to a robot includes: sequentially sending the navigation subtasks and the action subtasks to a robot according to an interactive execution sequence; or sending the navigation subtask, the action subtask and the interactive execution sequence information to a robot.

In the manner of sequentially sending the navigation subtask and the action subtask, the robot sequentially receives and executes the navigation subtask and the action subtask according to the sequence of the interactive execution sequence. Specifically, the robot first receives the navigation subtask, moves and positions according to the position and navigation information in the navigation subtask, then receives the corresponding action subtask, and executes actual operation. The method has the advantages that a simpler message transmission mechanism can be realized, one subtask is sent each time, and the robot continues to process the next subtask after finishing executing. This sequential execution approach may ensure that the robot completes tasks step by step in the order desired by the user, reducing the likelihood of confusion and error.

In this way, the navigation subtask, the action subtask, and the interactive execution order information are transmitted, and the interactive execution order information is transmitted to the robot in addition to the navigation subtask and the action subtask. Thus, the robot can clearly know the execution sequence of the whole task when receiving the task. The robot can analyze the information of the interactive execution sequence first, and then execute the tasks according to the corresponding navigation subtasks and action subtasks. The advantage of this approach is that the robot can get a complete task execution sequence before starting to execute the task, planning and preparing in advance. This allows for more efficient scheduling of movements and operations and maintains a comprehensive understanding of task progress throughout the task execution.

The method for sequentially sending the navigation subtasks and the action subtasks is simpler and more direct, and is suitable for scenes with simpler tasks and clear sequences. The mode of sending the navigation subtask, the action subtask and the interactive execution sequence information is more suitable for complex tasks, and can help the robot to integrate information and plan actions better and improve the execution efficiency of the tasks. The specific choice of which way depends on the nature of the task, the task achievement capabilities of the robot and the requirements of the specific application scenario.

In one or more embodiments of the present application, further comprising: when the navigation subtask and the action subtask are executed according to the interactive execution sequence, if the information of the task target which is not achieved by the robot feedback is received, determining a target interaction object corresponding to the task target which is not achieved;

In practical application, in the process of executing the navigation subtask and the action subtask according to the interactive execution sequence, if the robot receives the unachieved task target, feedback information which is unachieved to the server is received, and the robot needs to determine a target interactive object which does not achieve the target. For example, if a cutter that cuts kiwi fruit is not found in the kitchen, the cutter becomes a target interactive object that does not achieve the task goal. The server may generate a search subtask for the target interactive object based on the sensor information fed back by the robot and the unachieved task target. Taking a cutter for cutting kiwi fruits as an example, the robot can analyze the cutter image acquired by the vision sensor, determine that the cutter is in a kitchen area, generate a searching subtask and guide the robot to move from a restaurant to a kitchen to search the cutter. Finally, the server sends this search subtask to the robot for execution. The robot may guide its movement from the current location (restaurant) to the kitchen using the navigation subtask and rely on the sensor information to search for the target interactive object (tool). In the searching process, the robot can continuously detect the cutter images in the environment, and adjust and search according to the information fed back by the sensor until the target interactive object is found.

The server is responsible for analyzing the feedback information of the robot and generating appropriate search subtasks to determine the target interactive object. The robot receives and executes the subtasks, moves to the designated position by using the navigation subtasks, and searches through the sensor until the task target is reached. The information transmission between the server and the robot is tightly matched with task execution, so that the task can be effectively completed.

In one or more embodiments of the present application, after the searching subtask for searching the target interactive object using a sensor is sent to the robot, further comprising:

When the robot receives the search result fed back by the server and indicates that the target interactive object is found, the server can regenerate a navigation subtask or an action subtask corresponding to the unrealized task target according to the information. For the navigation subtask, the server may update the navigation target of the task with new target location information so that the robot can move most quickly and efficiently to the target address where the target interaction object is located. For example, if the tool is searched and confirms a target address in the kitchen, the server may generate a new navigation subtask, instructing the robot to move from the current location to the target address in the kitchen or to move within a valid distance range. For the action subtask, if the previous task requires the robot to perform certain operations or actions related to the target interaction object, the server may regenerate the action subtask to ensure that the robot interacts with the found target interaction object. For example, in a task of cutting kiwi fruit, if a cutter has been found, the server may generate a new action subtask, instructing the robot to pick up the cutter accurately to complete the cutting of kiwi fruit. After generating the readjusted navigation subtasks or action subtasks, the server sends the tasks to the robot so that the robot can re-plan paths or execute actions according to the new instructions to complete the whole tasks. After the robot receives the regenerated subtasks, the robot correspondingly adjusts the behaviors of the robot, and continues to execute the tasks according to new guidance so as to achieve the task target.

By regenerating the navigation subtask or the action subtask, the robot can update according to the search result so as to improve the accuracy and efficiency of task execution and ensure that the robot can successfully complete the task. This feedback and adjustment cycling process may provide the robot with greater flexibility and adaptability to handle changing environments and conditions during execution of tasks.

In one or more embodiments of the present application, the sending the navigation parameter corresponding to the navigation subtask and the action parameter corresponding to the action subtask to the robot includes:

The server generates navigation parameters from the current address to the target address of the robot, namely a navigation path, by utilizing technologies such as semantic map and simultaneous localization and mapping (Simultaneous Localization AND MAPPING, SLAM) and the like according to the sensor information and the target address provided by the robot. The navigation parameters may include information such as coordinates of the target position and an angle of robot orientation. By planning the navigation path, the server may determine the location to which the robot should travel and generate a series of navigation instructions that cause the robot to move along the path. Meanwhile, the server also generates action parameters of the mechanical tail end according to the sensor information provided by the robot and the current tail end gesture of the mechanical tail end. The motion parameters may include information about the specific motion that needs to be performed, posture adjustment, or path of arm movement. For example, in a kiwi fruit cutting task, the server may generate a cutting gesture to be adjusted according to a current gesture of the robot end, or generate an action parameter to be adjusted on an arm path by analyzing visual information in a task execution process. The robot executes the tasks according to the interactive execution sequence, and simultaneously considers the sequence of the navigation parameters and the action parameters, and sends the navigation parameters and/or the action parameters to the robot according to proper time. For example, before executing a task, navigation parameters are sent first, so that the robot moves to a target position according to a navigation path; and then, after the robot reaches the target position, sending action parameters to guide the robot to perform specific interaction actions. In this way, the server is responsible for planning and generating navigation parameters and action parameters, and by sending the parameters to the robot, instructs the robot to perform tasks. The division of labor can improve the efficiency and accuracy of task execution, fully exert the cooperative advantages of the robot and the server, and realize the smooth completion of the task.

Based on the same thought, the embodiment of the application also provides another robot control method which is applied to the robot. Fig. 4 is a schematic diagram of another robot control method according to an embodiment of the present application, where the method specifically includes the following steps:

Step 401: receiving a navigation subtask and an action subtask provided by a server, or receiving navigation parameters corresponding to the navigation subtask and action parameters corresponding to the action subtask; the navigation subtask is used for guiding the robot to navigate to a target address of any target interaction object; the action subtask is to guide the robot to realize limb actions aiming at any target interaction object to achieve a task target.

Step 402: and sequentially executing the navigation subtask and the action subtask according to the interactive execution sequence.

Step 403: and if the execution result is matched with the task request provided by the user, feeding back the end of task execution to the server.

In practical application, in order to implement execution of a task, a robot receives a navigation subtask and an action subtask provided by a server, or receives navigation parameters corresponding to the navigation subtask and action parameters corresponding to the action subtask. The purpose of the navigation subtask is to guide the robot to navigate to the target address of the specified target interaction object. If the robot has autonomous navigation capability, the robot can directly receive the navigation subtasks and execute navigation to reach the target address. If the robot does not have autonomous navigation capability, the robot needs to move to the target address along the planned navigation path according to the current position and orientation and the navigation parameters provided by the server. By executing the navigation subtask, the robot can accurately reach the position of the target interactive object. Space possibilities are provided for subsequent execution of the action subtasks. After the robot reaches the target address or within the effective distance range, the robot may receive an action subtask, which is intended to direct the robot to achieve interaction with the target interaction object through limb actions. If the robot does not have the capability to receive and execute the action subtask, the server may provide specific limb action parameters that achieve the task goals of the action subtask. For example, in cutting tasks with kiwi fruit, the action subtasks may include arm path planning parameters of the robot, cutting action sequence parameters, tool pose parameters, and the like. By executing the action subtask, the robot can perform the required limb actions with the target interaction object to complete the specific task target.

The mechanical end, limb, etc. of the robot may be in various forms, may be a multi-axis mechanical arm, may be a flexible magnetic end, etc. of various ends or limbs capable of being controlled by the robot to perform a certain task, and belongs to a component capable of being controlled by the robot on the robot. The shape and the working capacity of the limbs and the mechanical ends are not limited, and different mechanical ends or limb parts can be configured for the robot according to actual requirements.

In the interactive execution process, the robot sequentially executes the navigation subtasks and the action subtasks according to the execution sequence of the interaction. First, the robot performs a navigation subtask, moving along a navigation path to a target location. Once the robot reaches the target location, it will perform an action subtask, performing a limb action with the target interactive object. By gradually executing the subtasks in accordance with a predefined execution sequence, the robot is able to gradually complete the entire task through the cooperation of navigation and actions. In the execution process of the navigation subtask and the action subtask, task planning is required to be performed according to sensor information acquired by a sensor of the robot in real time, so that accurate navigation parameters and/or action parameters can be provided for the robot.

For example, the navigation subtask is to find a tool, and although the task target of the navigation subtask is known, how to find the tool also needs to acquire images of the surrounding environment in real time through a visual sensor. The robot sends the acquired images to a server in real time, and the server performs image recognition. Because the calculation force of the server is stronger, quick and accurate identification can be realized. If the tool is identified, the identification result is sent to the robot, and a navigation parameter (for example, the robot moves forward by 4 meters and then moves left by 2 meters) can guide the robot to the position where the tool is located. In the moving process, the robot needs to use a sensor to avoid the obstacle.

When the robot completes the task execution, it will match the results with the server. If the execution result is matched with the task request provided by the user, the robot feeds back information of the task execution completion to the server. In this way, the server may confirm whether the task completed successfully and further process subsequent operations or instructions. The feedback information of the end of the task execution may include a task state, verification of an execution result, or other related information so that the server can confirm the execution state of the robot.

Through the scheme, the server and the robot can be well cooperated to realize execution and feedback of the task request, so that the robot can effectively process the task. In addition, in practical applications, depending on the complexity of the task and the capabilities of the robot, it may be selected to send navigation parameters to the robot along with motion parameters, or to send subtasks to the robot. Depending on the requirements of the task and the design of the robot execution strategy, selecting an appropriate way may increase the efficiency and flexibility of task execution. The tasks with more calculation power consumption can be executed by the server during task decomposition and subtask execution, and the robot only needs to execute corresponding actions according to parameters provided by the server, so that the task execution efficiency and execution precision can be effectively improved, and meanwhile, the cost of the robot can be effectively reduced.

In one or more embodiments of the present application, the sequentially executing the navigation subtask and the action subtask in the interactive execution order includes:

When actually executing the task, the robot can judge whether the target interaction object is within the effective distance of the robot through the sensor. The effective distance refers to the range over which the mechanical end of the robot can contact or manipulate the target interactive object. These sensors may be visual sensors, tactile sensors, or other types of sensors for detecting the position, shape, and characteristics of the target interactive object.

If the target interaction object is within the effective distance of the robot, the robot will perform the corresponding action subtask. Executing the action subtask means that the robot will interact with the target interactive object through the limb actions according to the task requirements and the provided action parameters. For example, in a task of picking up a cup, the robot may perform a gripping action using a robot arm and then lift the cup. After the current action subtask is executed, the robot decides whether to continue to execute the next action subtask or to go to the navigation subtask according to the interactive execution sequence.

If the target interactive object is not within the effective distance of the robot, the robot will perform a navigation subtask. The navigation subtask is intended to guide the robot to move to the position where the target interaction object is located. The robot moves according to the planned navigation path and the navigation parameters and the current position information until reaching the target address. Once the robot completes the navigation subtask and reaches the target location, it will decide whether to continue to execute the next action subtask or go to the next navigation subtask according to the interactive execution sequence.

By selecting the execution of the action subtask or the navigation subtask depending on whether the target interactive object is within the effective distance of the robot, the robot can perform appropriate operations for different situations. If the target interaction object is within the effective distance, the robot directly executes the corresponding action subtasks to interact. And if the target interaction object exceeds the effective distance, the robot moves to the target position for interaction according to the navigation subtasks. Such a procedure may ensure that the robot interacts with the target interaction object within a suitable distance to complete the task objective. By comprehensively utilizing the sensor, the navigation subtask and the action subtask, the robot can effectively interact and operate with a target interaction object under a given interaction execution sequence according to the requirements of a task request and the capability of the robot, so as to achieve the set target of the task.

In one or more embodiments of the present application, the performing the action subtask if within the effective distance includes:

and executing the action subtask based on the action parameter.

In practical application, a first relative position relation between the target interaction object and the mechanical end of the robot is determined based on the sensor. The position information of the target interactive object is acquired through a sensor (such as a visual sensor) and compared with the current position of the mechanical end of the robot so as to determine the relative position relation of the relative angle, the relative distance and the like. The robot can help the robot to accurately judge the position and the direction of the target interaction object relative to the mechanical tail end, and further can accurately determine the action parameters. The first relative positional relationship and the operation parameter may be determined by the robot or by a server, and the determination is specifically required according to the capabilities of the robot itself.

And determining an action parameter of the mechanical end based on the first relative position relation and the current end gesture of the robot. The position information acquired by the sensor and the current tail end gesture of the robot are used for calculating and determining specific parameters required by the tail end of the machine to execute the action, such as path planning, action sequence, gesture control and the like of the execution action, by combining task requirements and the capability of the robot. These parameters will guide the limb movements and the manner of operation of the robot when performing the movement subtasks. Here, when determining the motion parameters of the mechanical end, in addition to the first relative positional relationship and the current end posture, the mechanical end characteristics, for example, whether the mechanical end is a manipulator or a magnetic attraction end, need to be considered, and if the motion parameters required by the manipulator include the angle, the direction, the speed, and the like of each axis and the joint motion. If the magnetic attraction end is considered, the angle, the direction, the speed, the magnetic attraction magnitude and the like of each shaft are considered.

And executing the action subtasks based on the action parameters. According to the determined action parameters, the robot uses its own actuator, joint controller or other actuating mechanism to execute the specific actions required for interacting with the target interactive object. For example, in a task that is cut with kiwi fruit, the action parameters may include the path and speed of the cutting action, according to which the robot will move the cutter, performing the cutting action.

Through the scheme, the robot can determine specific action parameters required by executing actions according to the information acquired by the sensor and the current terminal gesture, and execute action subtasks to interact with the target interaction object. The process can ensure that the robot can accurately control actions and complete the interaction operation required by tasks according to the position and the tail end gesture of the target object.

In one or more embodiments of the present application, the performing the navigation subtask if not within the effective distance includes:

A second relative positional relationship between the body position of the robot and the target address of the target interactive object is determined based on the sensor. The position information of the robot body and the target address information of the target interaction object are acquired through a sensor (such as a vision sensor), and then the relative position relation between the position information and the target address information is calculated, wherein the information can comprise the azimuth angle, the distance, the offset and the like of the target interaction object relative to the robot.

The navigation subtasks are performed so that the robot moves such that the target interactive element is located within a valid distance of the robot. Based on the second relative positional relationship, the robot plans an appropriate navigation path and movement strategy according to the task requirements and navigation algorithm to move itself to a position sufficiently close to the target interactive object. The navigation subtasks may involve problems in path planning, obstacle avoidance, speed control, etc. to ensure that the robot moves safely to the target location.

After the robot reaches the target position, the robot judges whether to execute the next action subtask or continue to execute the navigation subtask according to the task execution sequence. When the target interaction object is located within the effective distance and the robot has reached the target position, the robot will perform the action subtask to interact with the target interaction object.

Through the scheme, the robot can conduct navigation subtasks and move to the target position according to the relative position relation between the information acquired by the sensor and the target address. Such a procedure may ensure that the robot is able to reliably navigate and move to the location of the target interactive object for subsequent interactive operations.

In one or more embodiments of the present application, the determining, by the sensor, whether the target interactive object is within the robot effective distance includes:

And judging whether the target interaction object is in the visual range of the robot or not through a visual sensor. The robot may use visual sensors (e.g., cameras) to detect the surrounding environment and identify the target interactive object. If the target interactive object is visible in the visual field range of the robot, continuing to execute the next step; otherwise, the target interactive object is not in the visible range and can be determined not to be in the effective distance.

And if the target interaction object is in the visual range of the robot, determining the first distance based on a second relative position relationship between the body position of the robot and the target address of the target interaction object. The distance between the target interactive object and the robot is determined by calculating the relative positional relationship between the robot body position and the target address of the target interactive object, such as the euclidean distance or the manhattan distance.

A second distance is determined over which the mechanical tip can contact the limit length based on the body position of the robot. The robot can determine the maximum distance that the mechanical end can effectively contact according to the construction and specification of the robot. And if the first distance is smaller than the second distance, the target interaction object is within the effective distance of the robot. When the first distance is smaller than the second distance, this means that the distance between the target interactive object and the robot is within the range that the mechanical end can contact, so that the target interactive object can be considered to be within the effective distance of the robot. And if the first distance is not smaller than the second distance, the target interaction object is not in the effective distance of the robot. When the first distance is not smaller than the second distance, the distance between the target interactive object and the robot exceeds the range which can be contacted by the mechanical tail end, so that the target interactive object can be considered not to be within the effective distance of the robot.

Through the above steps, the robot can determine whether the target interactive object is within the visual range using the visual sensor, and determine whether it is within the effective distance of the robot through the distance comparison. Such a determination may help the robot determine whether a navigation sub-task or a direct execution action sub-task is needed when performing a task.

In one or more embodiments of the present application, further comprising: when the navigation subtask and the action subtask are executed according to the interactive execution sequence, if the task target is not achieved, feeding back the information of the task target which is not achieved and a corresponding target interaction object to the server;

And if the target interaction object is searched, re-executing the navigation subtask or the action subtask corresponding to the unachieved task target so as to ensure that the robot achieves the task.

During execution of the navigation subtask and the action subtask, if the robot fails to reach the task target, the robot will feed back information that the task target is not reached, and the corresponding target interaction object, to the server. After receiving the information fed back by the robot, the server can generate a search subtask aiming at the target interaction object according to the sensor information and the target of the unrealized task. The manner in which the search subtasks are generated may include:

using sensor information fed back by the robot, such as images of vision sensors or scan data of a lidar, the server may analyze the environment and determine the likely location of the target interactive object. This may be achieved by techniques such as target object recognition, object localization or environmental modeling.

Based on the target interaction object that did not reach the task goal, i.e., the desired interaction, the server may determine that the search target is the target interaction object. And the server generates a searching subtask aiming at the target interaction object according to the possible position of the target interaction object and the current position information of the robot. The search subtasks may include navigation instructions that instruct the robot to move to a predicted location for searching, or descriptions of features of the target interactive object, for assisting the robot in finding the target in the environment. When the robot executes the searching subtask and successfully finds the target interaction object, the robot re-executes the navigation subtask or the action subtask corresponding to the target of the task, so that the robot completes the task.

Through the scheme, the robot can feed back the information of not achieving the task target to the server, and re-execute the navigation subtask or the action subtask according to the search subtask provided by the server so as to achieve the task target. This feedback and searching mechanism may increase the adaptation of the robot, enabling it to interact with the server when difficult or incomplete tasks are encountered, and take further action to accomplish the completion of the task.

Based on the same thought, the embodiment of the application also provides a robot control device. Fig. 5is a schematic structural diagram of a robot control device according to an embodiment of the present application. As can be seen from fig. 5, the method specifically comprises the steps of:

the obtaining module 51 is configured to obtain a task request provided by a user.

The decomposition module 52 is configured to decompose the task request by using the pre-training large model to obtain a navigation subtask and an action subtask; the navigation subtask is used for guiding the robot to navigate to a target address of any target interaction object; the action subtask is to guide the robot to realize limb actions aiming at any target interaction object to achieve a task target.

And the sending module 53 is configured to send the navigation subtask and the action subtask to a robot, or send a navigation parameter corresponding to the navigation subtask and an action parameter corresponding to the action subtask to the robot, so that the robot sequentially executes the navigation subtask and the action subtask according to an interactive execution sequence.

An obtaining module 51, configured to receive the task request in a voice form or a text form provided by the user;

A decomposition module 52, configured to determine a target interaction object included in a process for completing the task request, and an interaction execution sequence of the robot with a plurality of the target interaction objects;

The sending module 53 is configured to send the navigation subtasks and the action subtasks to a robot in sequence according to an interactive execution sequence; or alternatively

Optionally, the sending module 53 is further configured to determine, when receiving information that the robot feedback does not reach the task target while executing the navigation subtask and the action subtask according to the interactive execution order, a target interaction object corresponding to the task target that is not reached;

Optionally, the sending module 53 is further configured to, if a search result of the target interactive object, which is fed back by the robot, is received, regenerate the navigation subtask or the action subtask corresponding to the unachieved task target;

Optionally, the sending module 53 is further configured to generate a navigation parameter from a current address of the robot to the target address according to the sensor information provided by the robot and the target address;

Based on the same thought, the embodiment of the application also provides another robot control device. Fig. 6 is a schematic structural diagram of another robot control device according to an embodiment of the present application. As can be seen from fig. 6, the method specifically comprises the steps of:

the receiving module 61 is configured to receive a navigation subtask and an action subtask provided by a server, or receive a navigation parameter corresponding to the navigation subtask and an action parameter corresponding to the action subtask; the navigation subtask is used for guiding the robot to navigate to a target address of any target interaction object; the action subtask is to guide the robot to realize limb actions aiming at any target interaction object to achieve a task target.

And the execution module 62 is used for sequentially executing the navigation subtask and the action subtask according to the interactive execution sequence.

And the feedback module 63 is configured to, if the execution result matches the task request provided by the user, feed back to the server end that the task execution is ended.

An execution module 62, configured to determine, by using a sensor, whether the target interactive object is within the effective distance of the robot;

An execution module 62 for determining a first relative positional relationship of the target interactive object and the mechanical end of the robot based on a sensor;

and executing the action subtask based on the action parameter.

An execution module 62 for determining a second relative positional relationship between the robot's body position and the target address of the target interactive object based on the sensor;

An execution module 62, configured to determine, by using a vision sensor, whether the target interactive object is within a visual range of the robot;

The execution module 62 is configured to, when the navigation subtask and the action subtask are executed according to the interactive execution sequence, if a task target is not achieved, feed back information of the task target not achieved and a corresponding target interaction object to the server;

And the execution module 62 is configured to re-execute the navigation subtask or the action subtask corresponding to the task target if the target interaction object is found, so as to enable the robot to achieve the task.

In one possible design, the structure of the robot control device shown in fig. 5 and 6 may be implemented as an electronic device. As shown in fig. 7, the electronic device may include: a processor 71, a memory 72. Wherein the memory 72 has stored thereon executable code which, when executed by the processor 71, at least enables the processor 71 to implement a charging pile positioning method as provided in the previous embodiments. The electronic device may also include a communication interface 73 in its structure for communicating with other devices or communication networks.

In addition, an embodiment of the present invention provides a non-transitory machine-readable storage medium having executable code stored thereon, which when executed by a processor in a server or a computer, causes the processor to execute the robot control method corresponding to fig. 3 and 4 provided in the foregoing embodiments.

The apparatus embodiments described above are merely illustrative, wherein the various modules illustrated as separate components may or may not be physically separate. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by adding necessary general purpose hardware platforms, or may be implemented by a combination of hardware and software. Based on such understanding, the foregoing aspects and their substantial or contributing portions may be embodied in the form of a computer product, which may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A robot control method, applied to a server, the method comprising:

Acquiring a task request provided by a user;

2. The method of claim 1, wherein the obtaining a user-provided task request comprises:

Receiving the task request in a voice form or a text form provided by the user;

3. The method of claim 2, wherein decomposing the task request to obtain a navigation subtask and an action subtask comprises:

4. The method of claim 2, wherein the sending the navigation subtask and the action subtask to a robot comprises:

5. The method as recited in claim 4, further comprising:

6. The method of claim 5, wherein following the search subtask to the robot to search for the target interactive object using a sensor, further comprising:

7. The method of claim 1, wherein the sending the navigation parameters corresponding to the navigation subtasks and the action parameters corresponding to the action subtasks to the robot comprises:

8. A robot control method, applied to a robot, comprising:

9. The method of claim 8, wherein the sequentially executing the navigation subtask and the action subtask in an interactive execution order comprises:

10. The method of claim 9, wherein the performing the action subtask if within the effective distance comprises:

and executing the action subtask based on the action parameter.

11. The method of claim 9, wherein the performing the navigation subtask if not within the effective distance comprises:

12. The method of claim 10, wherein the determining, by the sensor, whether the target interactive object is within the robot effective distance comprises:

13. The method as recited in claim 8, further comprising:

when the navigation subtask and the action subtask are executed according to the interactive execution sequence, if the task target is not achieved, feeding back the information of the task target which is not achieved and a corresponding target interaction object to the server;

14. The method of claim 13, further comprising, after receiving the server-provided search sub-task:

15. A robotic control system, the system comprising:

16. The system of claim 15, wherein if the system comprises a plurality of robots; the server sends the navigation subtask and the action subtask to a plurality of robots according to the interactive execution sequence;

17. A robot control device, applied to a server, comprising:

the acquisition module is used for acquiring a task request provided by a user;

18. A robot control device, characterized by being applied to a robot, the device comprising:

19. An electronic device, the electronic device comprising: a memory and a processor; wherein,

The memory is used for storing programs;

The processor, coupled to the memory, for executing the program stored in the memory for implementing the method of any of the preceding claims 1 to 14.

20. A computer readable storage medium, characterized in that the computer readable storage medium, when executed, is capable of implementing the steps in the method according to any of claims 1 to 14.