CN119238612A

CN119238612A - Humanoid robot flexible assembly method, system and medium based on thinking chain

Info

Publication number: CN119238612A
Application number: CN202411497531.6A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Guangzhou Ligong Industrial Co ltd
Current assignee: Guangzhou Ligong Industrial Co ltd
Priority date: 2024-10-25
Filing date: 2024-10-25
Publication date: 2025-01-03

Abstract

The embodiment of the application provides a flexible assembly method, a flexible assembly system and a flexible assembly medium for a humanoid robot based on a thinking chain, and belongs to the technical field of natural language processing. According to the scheme, task analysis is carried out on the assembly task instruction through a thought chain model by acquiring the assembly task instruction provided by a user, the weight of each word in the assembly task instruction is calculated through an attention mechanism in the analysis process, the analysis target related to the current task is dynamically adjusted according to the weight, the task is decomposed into a plurality of subtasks layer by layer according to the analysis target to obtain a task tree structure, the task tree structure is traversed to determine an assembly plan, the assembly plan comprises an execution arrangement sequence, task path planning and assembly action parameters, the humanoid robot is controlled to execute assembly operation according to the assembly plan, a fixed instruction template is not required to be preset, task information is automatically extracted by utilizing the combination of a hierarchical thought chain and the attention mechanism, the assembly operation is converted into the assembly operation, a clear execution plan is generated, and the flexibility of task analysis is improved.

Description

Flexible assembly method, system and medium for humanoid robot based on thinking chain

Technical Field

The application relates to the technical field of natural language processing, in particular to a human-shaped robot flexible assembly method, system and medium based on a thinking chain.

Background

With the development of the manufacturing industry towards intellectualization and flexibility, the conventional fixed flow assembly system has difficulty in meeting the requirements of diversified and customized production. The flexible manufacturing system is used as a solution for coping with complex and changeable production environments, can be quickly adapted to different types of assembly tasks and product changes, and improves the efficiency and flexibility of a production line. In the process, the humanoid robot gradually becomes an important role in the field of flexible manufacturing due to the humanized operation mode and flexible working capacity.

However, current humanoid robots still face multiple challenges in complex task parsing and autonomous assembly, such as requiring reprogramming or adjustment of task flows when faced with new tasks, which not only increases the development time of the system, but also increases the use threshold and cost. The traditional robot task allocation and planning mostly depend on a pre-written program or a fixed instruction template, the task analysis capability is insufficient, complex instructions are difficult to understand, the method is too stiff and lacks flexibility when dealing with changeable assembly tasks, and the requirements of quick adjustment and real-time response cannot be met. Meanwhile, when task planning and task execution are carried out, deep understanding and self-adaption capability of a complex assembly process are often lacking, so that operators need to invest a great deal of time to conduct task guidance and parameter adjustment, and the automation level of a production line is greatly limited.

Disclosure of Invention

The application mainly aims to provide a humanoid robot flexible assembly method, a system and a medium based on a thinking chain, aiming at reducing development and operation cost without depending on a pre-written program or a fixed instruction template, automatically and accurately extracting key information from complex instructions, generating a clear execution plan and realizing the intellectualization of flexible assembly.

To achieve the above object, an aspect of an embodiment of the present application provides a flexible assembly method of a humanoid robot based on a mental chain, the method comprising the steps of:

Acquiring an assembly task instruction provided by a user;

Task analysis is carried out on the assembly task instruction through a thinking chain model, the weight of each word in the assembly task instruction is calculated through an attention mechanism in the analysis process, an analysis target related to a current task is dynamically adjusted according to the weight, and the task is decomposed into a plurality of subtasks layer by layer according to the analysis target, so that a task tree structure is obtained;

Traversing the task tree structure to determine an assembly plan, wherein the assembly plan comprises an execution arrangement sequence, a task path plan and assembly action parameters;

and controlling the humanoid robot to execute the assembly operation according to the assembly plan.

In some embodiments, the acquiring the assembly task instruction provided by the user includes the following steps:

acquiring multi-modal data provided by a user, wherein the multi-modal data comprises text, voice and visual information;

performing word segmentation, part-of-speech tagging and semantic analysis on the characters and the voices by adopting a natural language processing technology, and extracting main contents and intentions of tasks to obtain text information;

Performing feature extraction and recognition on the visual information by adopting a computer vision technology to obtain symbol information;

and mapping the text information and the symbol information to a vector space to obtain the assembly task instruction.

In some embodiments, the calculating the weight of each word in the assembly task instruction through the attention mechanism dynamically adjusts the resolution target related to the current task according to the weight includes the following steps:

calculating an attention score of each word in the assembly task instruction based on context information through an attention mechanism, wherein the context information comprises a task history state and domain knowledge;

normalizing the attention score to obtain the weight of the word;

and determining the words with the weights larger than a preset threshold as the analysis targets related to the current task.

In some embodiments, the traversing the task tree structure determines an assembly plan comprising the steps of:

traversing the task tree structure to determine the sequence and the dependency relationship of the subtasks;

Performing logic verification on the task tree structure according to the sequence and the dependency relationship of the subtasks, and optimizing the subtask arrangement in the task tree structure based on an attention mechanism to obtain the execution arrangement sequence;

Determining the task path planning by adopting a motion planning algorithm according to the execution arrangement sequence;

performing task decomposition on the subtasks until the subtasks are decomposed into a plurality of basic actions which can be directly executed by the humanoid robot, and determining required assembly action parameters according to the basic actions;

And obtaining an assembly plan according to the execution arrangement sequence, the task path plan and the assembly action parameters.

In some embodiments, the logic verification is performed on the task tree structure according to the sequence and the dependency relationship of the subtasks, and the subtask arrangement in the task tree structure is optimized based on an attention mechanism, so as to obtain the execution arrangement sequence, which includes the following steps:

when a plurality of subtasks to be executed exist at the same time, evaluating the priority and the resource requirement of each subtask through an attention mechanism to obtain an evaluation result;

And determining a task scheduling strategy through a reinforcement learning algorithm according to the evaluation result, and adjusting the subtask arrangement in the task tree structure according to the task scheduling strategy to obtain the execution arrangement sequence.

In some embodiments, the mental chain based humanoid robot flexible assembly method further comprises the steps of:

Acquiring monitoring data, wherein the monitoring data comprises environmental information and assembly state data;

Judging whether the subtasks currently executed have abnormal conditions or not according to the monitoring data;

And when the subtasks have abnormal conditions, carrying out task analysis on the subtasks again through an attention mechanism, and dynamically adjusting the assembly plan corresponding to the subtasks.

Recording the task tree structure and the assembly state data to obtain an assembly flow result;

And displaying the assembly flow result through an interactive interface.

To achieve the above object, another aspect of the embodiments of the present application provides a mental chain based humanoid robot flexible assembly virtual system, the virtual system comprising:

the first module is used for acquiring an assembly task instruction provided by a user;

the second module is used for carrying out task analysis on the assembly task instruction through a thinking chain model, calculating the weight of each word in the assembly task instruction through an attention mechanism in the analysis process, dynamically adjusting an analysis target related to the current task according to the weight, and decomposing the task into a plurality of subtasks layer by layer according to the analysis target to obtain a task tree structure;

A third module for traversing the task tree structure to determine an assembly plan, wherein the assembly plan includes an execution arrangement order, a task path plan, and assembly action parameters;

and a fourth module for controlling the humanoid robot to execute the assembly operation according to the assembly plan.

To achieve the above object, another aspect of the embodiments of the present application provides a mental-chain-based humanoid robot flexible assembly hardware system, which includes a memory storing a computer program and a processor implementing the above method when executing the computer program.

To achieve the above object, another aspect of the embodiments of the present application proposes a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-mentioned method.

The method, the system and the medium for flexibly assembling the humanoid robot based on the thinking chain have the advantages that the task is analyzed by acquiring an assembly task instruction provided by a user through a thinking chain model, the weight of each word in the assembly task instruction is calculated through an attention mechanism in the analysis process, the analysis target related to the current task is dynamically adjusted according to the weight, the task is decomposed into a plurality of subtasks layer by layer according to the analysis target, a task tree structure is obtained, a preset fixed instruction template or rule is not needed to be relied, key information is automatically and accurately extracted from complex instructions in the task analysis process by utilizing the combination of a hierarchical thinking chain and an attention mechanism, irrelevant contents are ignored, the method is suitable for changeable expression modes in natural language, and the flexibility and the accuracy of the autonomous task analysis capability are improved.

The assembly plan is determined by traversing the task tree structure, wherein the assembly plan comprises an execution arrangement sequence, task path planning and assembly action parameters, and the humanoid robot is controlled to execute assembly operation according to the assembly plan, so that the task can be converted into the assembly operation, a clear execution plan is generated, the development and operation costs of a flexible assembly system are reduced, and the intellectualization of flexible assembly is realized.

Drawings

Fig. 1 is a flowchart of a flexible assembly method of a humanoid robot based on a thought chain provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of a task tree according to an embodiment of the present application;

Fig. 3 is a schematic structural diagram of a flexible assembly software system of a humanoid robot based on a thought chain according to an embodiment of the present application;

FIG. 4 is a flow chart of the overall operation of the thought chain provided by an embodiment of the present application;

Fig. 5 is a schematic structural diagram of a flexible assembly virtual system of a humanoid robot based on a thought chain according to another embodiment of the present application;

fig. 6 is a schematic diagram of a hardware structure of a flexible assembly hardware system of a humanoid robot based on a thought chain according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with embodiments of the application, but are merely examples of apparatuses and methods consistent with aspects of embodiments of the application as detailed in the accompanying claims.

It is to be understood that the terms "first," "second," and the like, as used herein, may be used to describe various concepts, but are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of embodiments of the present application. The words "if", as used herein, may be interpreted as "when" or "in response to a determination", depending on the context.

The terms "at least one", "a plurality", "each", "any" and the like as used herein, at least one includes one, two or more, a plurality includes two or more, each means each of the corresponding plurality, and any one means any of the plurality.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.

Before explaining the embodiments of the present application in detail, some nouns and terms involved in the embodiments of the present application are explained first.

(1) Hierarchical thought chain (HIERARCHICAL CHAIN of Thought, H-CoT) the hierarchical thought chain technique enables robots to gradually understand the overall framework and specific steps of a task by breaking down a complex task into multiple levels of subtasks.

(2) Attention mechanism (Attention Mechanism) refers to a mechanism by which a system or model can selectively focus on a portion of input information while ignoring other irrelevant or secondary information when processing the information. This mechanism mimics the manner in which humans are distributed in the processing of complex sensory information, enabling a system or model to process and understand information more efficiently. In deep learning, the attention mechanism is typically implemented as a weighting parameter assignment mechanism that assists the model in capturing relatively important information by calculating the importance of different pieces of information and assigning different weights accordingly.

Next, a part of the related art related to the embodiment of the present application will be described.

In the current flexible manufacturing system, the humanoid robot has the following key problems in terms of task analysis and autonomous assembly, and the problems severely limit the efficiency and the adaptability of the flexible manufacturing system.

(1) The task resolution capability is insufficient, and complex instructions are difficult to understand.

Currently, humanoid robots rely mostly on pre-written instruction scripts or fixed template rules when receiving assembly tasks. Although this method can perform well in highly standardized assembly tasks, it cannot flexibly understand and parse new task instructions in the face of diverse and complex production environments, and cannot cope with sudden changes in production and customization demands. This results in a humanoid robot with poor adaptability to new tasks, requiring a lot of manual intervention, and difficult to achieve true autonomous assembly.

(2) Task planning and execution lacks flexibility and adaptability.

In the related art, a humanoid robot generally adopts a fixed task planning process, and lacks dynamic adjustment capability in the task process. When the production environment or the assembly task changes, the humanoid robot cannot optimize and adjust the assembly flow according to the real-time condition. This results in the problem that in the task execution process, the humanoid robot is prone to motion stiffness and inefficiency, and cannot adapt to the requirements of flexible manufacturing on efficiency and flexibility.

(3) The information is not utilized enough, and the task key points cannot be effectively distinguished.

The task planning method of the humanoid robot generally cannot fully utilize key information in task description, is easily interfered by irrelevant information, and causes low task analysis efficiency. Particularly in complex assembly tasks, the humanoid robot needs to understand a great deal of detailed information, but related technologies have difficulty in automatically identifying and preferentially processing key information, thereby affecting the execution effect of the overall task.

(4) The multi-task parallel processing capability is weak, and the task scheduling efficiency is low.

In a flexible manufacturing environment, a production line needs to process multiple assembly tasks simultaneously, and the current humanoid robot system has limited capability in task scheduling and priority allocation. Because of the lack of analysis capability for importance and urgency of tasks, humanoid robots often cannot reasonably arrange the execution sequence of tasks, resulting in resource waste and production efficiency degradation.

In view of this, the embodiment of the application provides a flexible assembly method, a system and a medium for a humanoid robot based on a thinking chain, which are characterized in that an assembly task instruction provided by a user is acquired, task analysis is performed on the assembly task instruction through a thinking chain model, the weight of each word in the assembly task instruction is calculated through an attention mechanism in the analysis process, an analysis target related to a current task is dynamically adjusted according to the weight, the task is decomposed into a plurality of subtasks layer by layer according to the analysis target, a task tree structure is obtained, the combination of a hierarchical thinking chain and an attention mechanism is not needed, key information is automatically extracted from complex instructions in the task analysis process, irrelevant contents are ignored, the method is suitable for a changeable expression mode in natural language, and the flexibility and the accuracy of the autonomous task analysis capability are improved.

The assembly plan is determined by traversing the task tree structure, and comprises an execution arrangement sequence, task path planning and assembly action parameters, and the humanoid robot is controlled to execute assembly operation according to the assembly plan, so that the task can be converted into the assembly operation, a clear execution plan is generated, the development and operation costs of a flexible assembly system are reduced, and the intellectualization of flexible assembly is realized.

The embodiment of the application provides a flexible assembly method of a humanoid robot based on a thinking chain, and relates to the technical field of natural language processing. The flexible assembly method of the humanoid robot based on the thinking chain, which is provided by the embodiment of the application, can be applied to a terminal, a server and software running in the terminal or the server. In some embodiments, the terminal may be, but not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a vehicle-mounted terminal, etc., the server may be configured as an independent physical server, may be configured as a server cluster or a distributed system formed by a plurality of physical servers, may be configured as a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and basic cloud computing services such as big data and artificial intelligence platforms, etc., and the server may also be a node server in a blockchain network, and the software may be an application for implementing a flexible assembly method of a humanoid robot based on thought chains, etc., but is not limited to the above forms.

The application is operational with numerous general purpose or special purpose computer system environments or configurations. Such as a personal computer, a server computer, a hand-held or portable device, a tablet device, a multiprocessor system, a microprocessor-based system, a set top box, a programmable consumer electronics, a network PC, a minicomputer, a mainframe computer, a distributed computing environment that includes any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Fig. 1 is an optional flowchart of a mental chain based flexible assembly method of a humanoid robot according to an embodiment of the present application, and the method in fig. 1 may include, but is not limited to, steps S101 to S104.

Step S101, an assembly task instruction provided by a user is obtained.

In the flexible assembly of the humanoid robot, the humanoid robot needs to receive an assembly task instruction provided by a user. The assembly task instruction is a series of instructions provided by an operator to the humanoid robot for guiding the humanoid robot to complete a specific assembly task, and the instructions generally comprise tasks such as an assembly target object, required parts, an assembly step, a precision requirement and the like.

Step S102, task analysis is carried out on the assembly task instruction through a thinking chain model, the weight of each word in the assembly task instruction is calculated through an attention mechanism in the analysis process, the analysis target related to the current task is dynamically adjusted according to the weight, and the task is decomposed into a plurality of subtasks layer by layer according to the analysis target, so that a task tree structure is obtained.

In some embodiments, after the assembly task instruction is obtained, because the instruction often includes a large amount of complex information, there is information related to the current task, and there is also redundant and irrelevant information, so task parsing of the assembly task instruction through the pre-trained mental chain model is also required. The hierarchical thinking chain is characterized in that complex tasks are decomposed into subtasks layer by layer to form a tree structure which is convenient for the humanoid robot to understand and execute step by step. However, the conventional thinking chain technology lacks a flexible attention mechanism when handling diversified assembly tasks, and is difficult to cope with dynamic changes of information in the task process.

In order to solve the problem, the embodiment introduces an attention mechanism, so that the humanoid robot can focus on key information in the task analysis process and ignore redundant data, thereby improving the precision and the execution efficiency of task decomposition.

Analyzing an assembly task instruction through a thinking chain model, calculating the weight of each word in the assembly task instruction through an attention mechanism in the analysis process, and dynamically adjusting an analysis target related to a current task according to the weight, so that the humanoid robot can accurately extract key information from the instruction of an operator, and dynamically adjusting the attention degree of the model to different task details, wherein the current task refers to a specific assembly task which is being processed or is to be processed by the humanoid robot. And then decomposing the task into a plurality of subtasks layer by layer according to the analysis target to obtain a task tree structure.

Specifically, firstly, an assembly task instruction is input into a thinking chain model to perform preliminary analysis on the instruction, and in the analysis process, a human-shaped robot is helped to extract a main task target from a complex instruction by an attention mechanism, so that a plurality of high-level tasks are obtained, and the high-level tasks are used as high-level nodes of the thinking chain. Since the instruction may contain multiple task targets, the attention mechanism calculates the weight of each word in the instruction, determines the key step in the task according to the weight distribution to obtain the analysis target, and the higher the weight is, the more likely the key step in the current task is. For example, the instruction is "please mount part a onto component B and then perform accuracy calibration", and the high-level tasks include "mount part a to component B" and "accuracy calibration".

It should be noted that the attention mechanism may learn in advance to train the large language model through a large amount of sample data related to the assembly task, so that the large language model can identify which steps are critical in a specific scenario or context.

It will be appreciated that the attention mechanism does not directly understand which information is the focus from the assembly task instructions, but automatically learns to identify key steps common to tasks through the use of pre-training models, in combination with training data for specific task domains. For example, after a large amount of assembly task data are trained, the model can learn to automatically allocate weights, the emphasis of the current task is dynamically determined, and the attention mechanism adjusts the attention degree of the model to input information according to the weights, so that higher weights and attention are given to the emphasis content of the task. For example, the ability to identify "mounted parts" and "precision calibration" operations from a trained mental chain model is often a critical step in assembly tasks.

Through the pre-trained thinking chain model and training data in the task field, the model can flexibly cope with different task instructions, and an attention mechanism automatically determines which information in the instructions is the key of the current task according to task input (namely, assembly task instructions), so that fixed template rules do not need to be preset.

And secondly, continuing to recursively decompose each analysis target according to the analysis targets obtained by analysis, and decomposing the high-level task into finer subtasks to form a middle-level node. In the subtask decomposition process, the attention mechanism guides the humanoid robot to pay attention to details related to the current task, avoids interference of irrelevant information, refines the target of the high-level task, for example, the high-level task of 'installing part A to component B' can be decomposed into 'acquiring part A', 'positioning component B', 'installing part A to component B'.

And finally, gradually decomposing to obtain a set comprising high-level tasks and subtasks, wherein all nodes are connected with each other through the dependency relationship of the tasks to form a task tree structure. In the task tree, each node represents a high-level task or subtask, and the connections between nodes represent the dependency relationship of the tasks.

Step S103, traversing the task tree structure to determine an assembly plan, wherein the assembly plan comprises an execution arrangement sequence, task path planning and assembly action parameters.

In particular, the task tree structure is a multi-way tree in which each node may have any number of children. And determining an assembly plan according to the sequence and the dependency relationship of the tasks by traversing the task tree, wherein the assembly plan comprises an execution arrangement sequence, task path planning and assembly action parameters.

Further, kinematic and dynamic constraints of the humanoid robot, as well as various constraints in the environment, may also be considered in the generation of the assembly plan. For example, in the process of generating task path planning, a motion planning algorithm is utilized to generate a feasible motion path so as to avoid collision with an obstacle, and when the assembly action parameters of the grabbing parts are generated, proper grabbing force and gesture are calculated so as to avoid damaging the parts or generating assembly errors.

Step S104, controlling the humanoid robot to execute the assembly operation according to the assembly plan.

Illustratively, the humanoid robot performs the respective task steps step by step in accordance with the generated assembly plan in accordance with the execution arrangement order, for example, in the order of "move to position of part a", "grasp part a", "mount part a to component B", moves from the part storage area to the table in accordance with the task path plan, confirms the position of the part with the vision sensor in accordance with the assembly action parameters, monitors the stress condition in the assembly process with the force sensor, to perform the assembly operation.

In the embodiment, the complex tasks are refined layer by introducing the hierarchical thinking chain structure, and the human-shaped robot can analyze the assembly task instructions input by an operator and dynamically identify key information in the assembly task instructions, so that detailed task decomposition and assembly plans are automatically generated, the flexibility of task analysis is improved, the understanding precision of task details is greatly improved, and the human-shaped robot can accurately execute various complex assembly tasks. The attention mechanism is introduced in the task analysis process, so that the humanoid robot has self-adaptive adjustment capability, does not need to rely on a fixed template or a predefined rule, and solves the problem that the existing robot assembly system is difficult to adapt to changeable expression modes in natural language when processing diversified and complex task instructions.

In addition, compared with the existing robot assembly system, when facing a new task, the task flow needs to be reprogrammed or adjusted, so that the development time of the system is increased, the use threshold and the cost are also increased, and the human-shaped robot can understand the natural language description of an operator and automatically generate a task plan by combining a large language model and multi-modal analysis capability, so that the task writing and adjusting process is greatly simplified. An operator can finish the task issuing of the humanoid robot only through simple instructions without writing complex scripts, so that the development and operation cost of the system is reduced.

In some embodiments, step S101 may include, but is not limited to including, step S201 through step S204.

Step S201, multi-mode data provided by a user is obtained, wherein the multi-mode data comprises text, voice and visual information.

It will be appreciated that the user multimodal data is data formed from a plurality of information forms that a user may use in providing instructions for an assembly task, including but not limited to describing the specific content and requirements of the assembly task in natural language (e.g., chinese, english, etc.), or expressing intent by voice commands, by gestures, schematic drawings, etc. multimodal information. In order for a humanoid robot to fully understand this information, it is first necessary to obtain multimodal data provided by the user for subsequent unified encoding of task inputs.

It should be noted that, the user may provide the multimodal data in one or more modes according to actual needs and preferences, and embodiments of the present application are not limited in particular.

Step S202, word segmentation, part-of-speech tagging and semantic analysis are carried out on the characters and the voices by adopting a natural language processing technology, and main contents and intentions of tasks are extracted to obtain text information.

Optionally, when the multimodal data includes text and speech, a natural language processing technology is adopted to segment the text and speech, part of speech label and extract the main content and intention of the task through semantic analysis, so as to obtain text information.

Illustratively, a voice instruction is recognized, a voice signal is converted into words, then the words are segmented and part of speech labeled by using a natural language processing technology, words in the instruction are recognized as nouns, verbs and the like, and then semantic analysis is performed to understand the overall meaning and intention of the instruction, which usually involves mapping the segmented text into a semantic frame or using a deep learning model for semantic understanding. For example, in the instruction "please install part a onto component B", the "please", "will", "part a", "install", "to", "component B" are obtained through the segmentation, the main content of the task is identified as "install" after the semantic analysis, and the key information "part a" and "component B" are extracted.

And step S203, performing feature extraction and recognition on the visual information by adopting a computer vision technology to obtain symbol information.

Optionally, when the multimodal data includes visual information such as gestures and diagrams, a computer vision technology is used to extract features of the visual information, so as to convert the visual information into symbolized information that can be processed by a computer.

Illustratively, the gesture image is first preprocessed (e.g., denoised, enhanced, etc.), feature information such as shape, motion trajectory, etc. in the gesture image is extracted, and then the extracted features are classified and identified by using a recurrent neural network to convert the features into symbol information.

And step S204, mapping the text information and the symbol information to a vector space to obtain an assembly task instruction.

Specifically, text information and symbol information obtained by converting all multi-mode data are mapped to a unified high-dimensional vector space through an embedding layer to form multi-mode characteristic representation of task input, an assembly task instruction is obtained, and a foundation is provided for subsequent task analysis.

In step S204 of another embodiment, when the multimodal data is composed of only words or languages, the text information is obtained by performing word segmentation, part-of-speech tagging, and semantic parsing on the words or languages to extract the main content and intent of the task, and the text information is mapped to a vector space to obtain an assembly task instruction.

Or when the multi-mode data is only composed of visual information, the visual information is subjected to feature extraction and identification by adopting a computer visual technology to obtain symbol information, and the symbol information is mapped to a vector space to obtain an assembly task instruction.

In step S102 of some embodiments, calculating the weight of each term in the assembly task instruction by the attention mechanism, dynamically adjusting the resolution objective related to the current task according to the weight may include, but is not limited to including, step S301 to step S303.

In step S301, an attention score of each word in the assembly task instruction is calculated based on the context information by an attention mechanism, wherein the context information includes a task history state and domain knowledge.

In particular, attention mechanisms were originally used in the field of neural machine translation for dynamically focusing on different parts of an input sequence as sequence data is processed, and introduction of attention mechanisms allows models to automatically identify and focus on key information in instructions during task parsing.

However, the conventional scoring function generally adopts a simple matching manner, so as to improve understanding of complex instructions, the present embodiment improves the scoring function, introduces context information in the attention scoring process, and calculates the attention score of each word in the assembly instruction based on the context information through an attention mechanism, wherein the context information comprises the task history state of the previous time step and the domain knowledge related to the task.

The scoring function ei is expressed as formula (1):

e_i＝Score(h_i,S_t-1)＝v^Ttanh(W_hh_i+W_sS_t-1+W_kk_i) (1),

In equation (1), h _i is the hidden state of the i-th word in the input sequence, typically represented by an encoder, S _t-1 is the decoder state of the previous time step, reflects the historical context of the model, k _i is a domain knowledge vector, represents the domain knowledge related to the task, W _h、W_s、W_k is a weight matrix, which is a parameter that needs to be optimized by learning, and v is a learnable weight vector for weighting and summarizing the output of the tanh function.

Illustratively, first, an assembly task instruction input by a task is received, a task state of the last step is obtained, and meanwhile, a domain knowledge base is called to obtain background information related to a current task. Then, scoring the input information word by word or segment by using a scoring function, and evaluating the importance of each word.

In the case of first parsing an instruction, an initial value or a default value may be set for the decoder as the decoder state of the previous time step, and scoring may be performed by a scoring function together with the currently input word.

Step S302, normalization processing is carried out on the attention scores to obtain the weights of the words.

Specifically, in the task decomposition process of each level, the humanoid robot normalizes the attention score of each word, and calculates the weight of each word.

The calculation formula of the weight alpha _i is shown as formula (2):

In the formula (2), α _i represents the attention weight of the ith word, e _i is a scoring function for evaluating the importance of the current word to task analysis, n is the length of the input sequence, and the scoring function e _j calculates the importance of the jth word in the input sequence to task analysis.

Illustratively, for the instruction "please mount part a onto component B and then perform accuracy calibration", the attention mechanism will give a higher weight to "mount part a to component B" and "perform accuracy calibration".

In step S303, the word with the weight greater than the preset threshold is determined as the resolution target related to the current task.

Optionally, according to the calculated weight, determining the word with the weight greater than the preset threshold in the assembly task instruction as the analysis target related to the current task. For example, "mounting part a" may be weighted more heavily than other unrelated descriptions due to its "fitting" relevance, and the attention mechanism determines this by this weight as a key step in the current task.

In step S303 of another embodiment, the words with sequence numbers within the preset number may be determined as the analysis targets of the current task according to the calculated weights and sorting according to the sizes.

In this embodiment, the system does not need to build a fixed subtask library to match, but rather, by introducing an attention mechanism, the model can flexibly cope with different task instructions in the task analysis process, and automatically identify and focus on key information in the instructions according to task input and context, so that a weight is assigned to each step, and the higher the weight is, the more likely the step is the focus step in the current task, thereby improving understanding of the humanoid robot on complex assembly tasks.

In some embodiments, step S103 may include, but is not limited to including, step S401 to step S405.

Step S401, traversing the task tree structure to determine the sequence and the dependency relationship of the subtasks.

The common traversing method of the multi-fork tree comprises depth-first traversing and breadth-first traversing, wherein the depth-first traversing is to traverse each branch of the tree from a root node to a child node and then traverse other branches, the depth-first traversing is divided into front traversing and rear traversing, the front traversing is to access the root node before the child node, the rear traversing is to process the child node preferentially in the accessing process, and the breadth-first traversing is to traverse the child node from top to bottom sequentially according to layers.

Referring to fig. 2, given an N-ary tree, node 1 is the root of node 3, node 2 and node 4, and node 3 is the root of node 5 and node 6 for the entire tree.

When traversing the task tree by adopting the preamble traversal, the access sequence is node 1, node 3, node 5, node 6, node 2 and node 4.

When traversing the task tree by adopting the subsequent traversal, the access sequence is node 5, node 6, node 3, node 2, node 4 and node 1.

When traversing the task tree by breadth-first traversal, the access sequence is node 1, node 3, node 2, node 4, node 5, node 6.

Determining the sequence and the dependency relationship of the subtasks by traversing all root nodes and child nodes of the task tree structure.

Step S402, according to the sequence and the dependency relationship of the subtasks, carrying out logic verification on the task tree structure, and optimizing the subtask arrangement in the task tree structure based on the attention mechanism to obtain the execution arrangement sequence.

Specifically, the task tree structure is logically verified according to the sequence and the dependency relationship of the subtasks, the generated task steps are compared through analysis of the domain knowledge base, unreasonable operation steps are eliminated, the reasonability of the task dependency relationship and the execution condition in the task decomposition process is ensured, the subtask arrangement in the task tree structure is optimized based on the attention mechanism, the node sequence in the task tree is adjusted, so that the execution efficiency is optimized, and the execution arrangement sequence is obtained.

Step S403, determining task path planning by adopting a motion planning algorithm according to the execution arrangement sequence.

Specifically, the system generates a feasible motion path to determine a mission path plan by using a motion planning algorithm (such as a fast random number algorithm or an a-x algorithm) in combination with joint constraints of the humanoid robot and an environment map according to an execution scheduling sequence.

Step S404, decomposing the subtasks into a plurality of basic actions which can be directly executed by the humanoid robot, and determining required assembly action parameters according to the basic actions.

Specifically, after determining the task execution sequence and the task path, the subtask is continuously decomposed by using an attention mechanism, each subtask is decomposed into specific operations to be executed on each task node until the specific operations are decomposed into a plurality of basic actions which can be directly executed by the humanoid robot, and the required assembly action parameters such as position information, force control parameters and the like are determined according to the basic actions.

Illustratively, a force control algorithm is utilized to ensure that the humanoid robot applies proper force during assembly, avoiding damage to parts or assembly errors.

Step S405, obtaining an assembly plan according to the execution arrangement sequence, the task path plan and the assembly action parameters.

Specifically, the execution arrangement sequence, the task path planning and the assembly action parameters are integrated together to form a complete assembly plan, and all steps and actions required to be executed by the humanoid robot in the whole assembly process and specific parameters of each step and action are described in detail in the assembly plan, so that the humanoid robot is controlled to gradually complete each subtask according to the planned path and action sequence in the execution process.

In some embodiments, step S402 may include, but is not limited to including, step S501 through step S502.

In step S501, when there are multiple subtasks to be executed at the same time, the priority and resource requirement of each subtask are evaluated by the attention mechanism, and the evaluation result is obtained.

Specifically, when a plurality of subtasks to be executed exist at the same time, all the subtasks to be executed at the same time are analyzed as a set, and the priority and the resource requirement of each subtask are evaluated through an attention mechanism, so that an evaluation result is obtained.

Step S502, determining a task scheduling strategy through a reinforcement learning algorithm according to the evaluation result, and adjusting subtask arrangement in a task tree structure according to the task scheduling strategy to obtain an execution arrangement sequence.

Further, a task scheduling strategy is determined through a reinforcement learning algorithm according to the evaluation result, for example, when two subtasks need to use the same tool, the execution sequence of the two subtasks needs to be arranged, or an alternative scheme is searched, the subtask arrangement in the task tree structure is dynamically adjusted according to the task scheduling strategy, new tasks are added adaptively, or the original subtask sequence is changed, and the execution scheduling sequence is obtained.

Illustratively, reinforcement learning algorithms (e.g., Q-learning or deep reinforcement learning DQN) dynamically adjust and optimize the multi-task scheduling strategy according to the priorities of tasks, resource requirements, task dependencies, etc. in the evaluation results through continuous learning and feedback loops to ensure reasonable resource allocation and maximize overall efficiency.

The current task execution state, the resource usage situation and the task dependency relationship are firstly encoded into a state vector, and the state vector is used as the input of reinforcement learning, wherein the state information comprises the current progress of each task, the resource requirement of the task, the availability of the resource and the like.

Second, an action space is determined, which is defined as the scheduling decisions that can be taken in each state. For example, it may be selected which task to execute preferentially, to pause a task, or to change the order of tasks, etc. Because the selection of each action can influence the smoothness of task execution, reinforcement learning evaluates the effect of each scheduling decision through a reward mechanism, sets a reward value for each action, and the higher the scheduling efficiency is, the larger the reward is. The rewards may be based on a number of metrics such as completion time of the task, utilization of resources, reduction of conflicts, and the like. If the scheduling is successfully optimized, the task parallelism is improved, the waiting time is reduced, and higher rewards are obtained.

The reinforcement learning algorithm learns which scheduling decisions are optimal solutions according to rewards feedback in different states by trying different scheduling strategies. The updated scheduling strategy is repeatedly executed, so that the scheduling can be efficiently performed when the same or similar tasks are encountered in the future.

It will be appreciated that the reinforcement learning algorithm allows the system to adjust in real time based on feedback during task execution. If a new task is added or an existing task is changed, the system dynamically evaluates the new state and selects an appropriate scheduling policy. The system gradually learns the optimal multi-task scheduling method through continuous exploration and adjustment.

In this embodiment, by combining the analysis result of the thought chain and the reinforcement learning algorithm, an intelligent multi-task scheduling mechanism is established, the priority and resource requirement of each task are evaluated by using the attention mechanism, and the execution sequence of the tasks is optimized, so that the tasks with high priority are processed in time, and the resources are reasonably arranged. The multi-task scheduling strategy improves the parallel processing capability of the humanoid robot, improves the overall working efficiency of the flexible assembly production line, and can effectively solve the problems of resource waste and production bottleneck caused by the fact that the prior robot system generally lacks the evaluation and scheduling capability of task priority in a multi-task environment.

In some embodiments, the thought chain based humanoid robot flexible assembly method may further include, but is not limited to including, step S601 to step S603.

In step S601, monitoring data is acquired, wherein the monitoring data includes environmental information and assembly status data.

In particular, the assembly task often has complexity and uncertainty, and changes in the environment may affect the smooth execution of the task, and therefore, monitoring by various sensors in real time is required to acquire monitoring data, so that the system has the capability of real-time feedback and adaptive adjustment, wherein the detection data includes environmental information and assembly state information. For example, a visual sensor may detect the position of a part to obtain visual data, and a force sensor may sense the force conditions during assembly to obtain tactile data.

Step S602, judging whether the currently executed subtask has abnormal conditions according to the monitoring data.

Optionally, whether the currently executed subtask has an abnormal condition is judged according to the monitoring data obtained by the various sensors.

The method includes the steps of judging whether position deviation exists according to the position of a part monitored by a visual sensor, and judging whether force exceeds or is lower than a preset safety range according to the stress condition perceived by a force sensor.

Step S603, when the subtask has abnormal conditions, the subtask is subjected to task analysis again through an attention mechanism, and an assembly plan corresponding to the subtask is dynamically adjusted.

Specifically, when an abnormal condition of the subtask is detected, an adaptive adjustment mechanism is triggered, including task analysis is carried out on the subtask again through an attention mechanism, and an assembly plan corresponding to the subtask is dynamically adjusted. The self-adaptive adjustment is mainly characterized in that on one hand, focus of re-evaluation tasks is manufactured by using an attention mechanism, task analysis and planning are adjusted, and on the other hand, an execution plan is modified in real time, and corrective measures are taken. For example, if a part position discrepancy is found during assembly, the system may temporarily suspend the current action, reposition the part, and update the subsequent assembly path and action parameters. The real-time feedback and adjustment mechanism ensures that the humanoid robot can still efficiently and accurately complete the assembly task in a dynamic environment, and improves the robustness of the system.

It will be appreciated that when an anomaly is detected, the problem is typically limited to the currently executing subtask, and that only the attention mechanism is used to re-evaluate the currently executing subtask and make a corresponding adjustment. The adaptive adjustment mechanism typically only re-evaluates subtasks that detect anomalies, which can improve efficiency and avoid unnecessary re-planning of the entire task. Only when an exception affects the entire task chain (e.g., critical equipment failure) is a global evaluation triggered, the task is considered to be re-evaluated globally and a full adjustment made.

It should be noted that, during the task execution process of some embodiments, the humanoid robot may dynamically adjust the attention weight according to the real-time feedback, and re-evaluate the key part in the task, for example, when an assembly error is detected, the attention mechanism may assign more weight to the relevant assembly step, and dynamically adjust the attention focus, so as to prompt the humanoid robot to perform important inspection and correction.

Further, in step S603 of some embodiments, in order to make full use of the multi-modal information such as vision and touch, the scoring function is improved again, and the improved attention mechanism fuses the multi-modal data to obtain a new scoring function as shown in formula (3):

e_i＝v^Tanh(W_hh_i+W_sS_t-1+W_vv_i+W_tt_i) (3),

in the formula (3), W _v、W_t represents a weight matrix, which is a parameter to be optimized by learning, v _i is a feature vector of visual data, and t _i is a feature vector of haptic data.

Optionally, the data of the visual sensor, the tactile sensor and the like are integrated through the multi-mode data fusion module to generate a unified characteristic representation, so that the humanoid robot is helped to better sense the environment, the humanoid robot is supported to accurately position parts and adjust assembly force in the task analysis and assembly process, and the stability and the accuracy of the assembly process are improved.

In the embodiment, when the problems of position deviation, part specification change and the like of the existing humanoid robot are considered, manual intervention adjustment is often needed, so that the production efficiency is reduced, in the assembly process, the attention weight is dynamically adjusted according to monitoring data (such as vision and force sense information) fed back in real time, the key point of the task is re-evaluated, and the execution steps are automatically optimized, so that the humanoid robot can rapidly perform self-adjustment when facing the environmental change in the assembly process, the self-adaptation capability of the humanoid robot to the changeable environment is realized, the human intervention is reduced, and the stability and the continuity of the production line are improved.

In some embodiments, the thought chain based humanoid robot flexible assembly method may further include, but is not limited to including, step S701 to step S702.

Step S701, recording task tree structure and assembly state data to obtain an assembly flow result.

Specifically, all nodes of the task tree structure obtained in the task analysis process are recorded by adopting a tree structure or a linked list, including a task target, an execution step and a dependency relationship of each node, and the collected assembly state data are associated with corresponding nodes in the task tree structure, so that an assembly flow result is obtained by summarizing.

Alternatively, the assembly flow results may be generated in the form of an assembly report listing the execution status, time consumption, execution progress of each step, and any related anomalies or errors in the report.

Step S702, the assembly flow result is displayed through the interactive interface.

It can be understood that, in order to facilitate the operator to view the analysis result and the execution state of the task in real time, the assembly flow result can be displayed through the interactive interface, and the complex assembly process is presented in an intuitive and understandable manner by utilizing multimedia elements such as charts, animations and the like. For each step, the interactive interface can also provide functions of zooming in, zooming out, dragging, screening, searching and the like, so that an operator can check and analyze the assembly flow result according to the requirement.

The following describes and illustrates the embodiments of the present invention in detail with reference to specific application examples.

In order to realize the flexible assembly method of the humanoid robot based on the thought chain, which is provided by the embodiment of the application, and solve the problems of insufficient flexibility and self-adaptation capability of the humanoid robot in complex task analysis and assembly processes in the related technology, referring to fig. 3, the embodiment of the application provides the flexible assembly method software system of the humanoid robot based on the thought chain, which adopts a modularized design and comprises a multi-mode task input and encoding module, a hierarchical thought chain task analysis module of an attention mechanism, a task planning and execution module, a real-time feedback and self-adaptation adjustment module, a multi-task scheduling and optimizing module and a system control and man-machine interaction interface module.

The functional modules communicate through standard interfaces, for example, a task analysis module, a task planning and execution module and a real-time feedback and self-adaptive adjustment module work cooperatively through message transmission and data sharing.

The multi-modal task input and encoding module is configured to receive assembly task instructions issued by an operator, which may include various forms of information, such as text descriptions, voice instructions, and even gestures or schematics. In order to facilitate humanoid robot processing, task input can be uniformly encoded through a multi-mode task input and encoding module.

In the implementation process, the system adopts a natural language processing technology to segment words, mark parts of speech and analyze semantics of instructions in a text or voice form, and extracts main contents and intentions of tasks. Meanwhile, for visual input, such as gestures or schematic diagrams, computer vision technology is utilized to perform feature extraction and recognition, and the feature extraction and recognition are converted into processable symbolized information. Finally, mapping the data of all modes to a unified high-dimensional vector space through an embedding layer to form multi-mode characteristic representation of task input, and providing a basis for subsequent task analysis.

The hierarchical thinking chain task analysis module of the attention mechanism is used for carrying out a high-level task analysis stage by utilizing the encoded assembly task instruction, and analyzing the task instruction word by word or segment by segment through the attention mechanism, so that key information in the assembly task instruction is automatically identified, and a plurality of high-level tasks are obtained.

And then, carrying out recursive subtask decomposition on each high-level task, and guiding the humanoid robot to pay attention to details related to the current subtask in the subtask decomposition process so as to avoid interference of irrelevant information. The attention mechanism continues to function, directing the model to focus on detailed information related to the current subtask. For example, in decomposing "mount part a to component B", the attention mechanism may focus the model on the necessary information such as the attribute of "part a," the position of component B "and the like.

When task decomposition is realized, the model analyzes task instructions word by word or segment by segment through a cyclic neural network (such as LSTM or GRU) or a transducer framework and by combining an attention mechanism, and in each analysis step, the model dynamically calculates attention weight according to the task state and current input of the previous step to determine the next content to be focused, so that the analysis path is flexibly adjusted, and deep understanding of the task is ensured.

Finally, through multi-level task decomposition, the system generates specific task execution steps according to the key information to obtain a complete task tree structure, wherein the task tree contains detailed information from a high-level task target to the specific execution steps, and a foundation is laid for subsequent task planning and execution.

After the task planning and executing module obtains the task tree generated by the task analysis module, a specific assembly plan is generated according to the sequence and the dependency relationship of the tasks by traversing the task tree, including the generation of the execution arrangement sequence, the task path planning and the assembly action parameters. And generating detailed assembly steps by utilizing a task tree structure, and guiding the humanoid robot to execute operations such as moving, grabbing, installing and the like by a motion planning algorithm, so as to ensure that the assembly task is efficiently completed according to a preset flow.

In the execution process, in order to improve the accuracy of execution, sensor feedback control can be added in key steps, for example, a visual sensor is used for confirming the position and the gesture of a part, and a force sensor is used for monitoring the stress condition in the assembly process so as to adjust the action of the humanoid robot in real time.

The real-time feedback and self-adaptive adjustment module is used for monitoring the state and environment information of the humanoid robot in real time through various sensors in the execution stage and adjusting task planning according to feedback. Through information such as position deviation, force feedback in the real-time monitoring assembly process, the system can readjust task execution plan when detecting the abnormality, ensures that the task is accomplished smoothly to realize humanoid robot's self-adaptation ability to changeable environment.

The multi-task scheduling and optimizing module is used for reasonably scheduling and optimizing the multi-tasks when the humanoid robot needs to process a plurality of assembly tasks simultaneously. The implementation details comprise the steps of firstly analyzing a task set to be executed, evaluating the priority and the resource requirement of each task by using an attention mechanism, and then learning an optimal task scheduling strategy by combining a reinforcement learning algorithm, so that human-shaped robot resources are reasonably distributed, and the overall assembly efficiency is maximized.

In the scheduling process, the system needs to consider the dependency relationship between tasks, avoid resource conflict, ensure the timely processing of the high-priority tasks, realize the reasonable parallel execution of the multiple tasks, optimize the resource allocation and improve the overall assembly efficiency.

The system control and man-machine interaction interface module is used for providing man-machine interaction and supporting the input of task instructions and the real-time monitoring assembly process. An operator can input a task instruction through a natural language on a user interface, and can check an analysis result, an execution state, progress and an assembly result of a task in real time, so that the operator can conveniently monitor in real time. The interface also provides visual assembly flow display, helps the operator to know the working condition of the humanoid robot, and can intervene on the humanoid robot through the interface if necessary, and provides additional instructions or adjusts task parameters.

In the embodiment, by using a hierarchical thinking chain technology and combining an attention mechanism, the humanoid robot can automatically analyze complex natural language assembly instructions, convert the complex natural language assembly instructions into specific operation steps, and realize efficient and accurate task analysis. Along with the rapid development of large models (such as GPT-4 and the like) and natural language processing technologies, the task analysis method based on the pre-training language model shows strong understanding and reasoning capability, and by utilizing the large models, the humanoid robot can extract key task information in complex natural language instructions, so that automatic analysis and execution of tasks are realized, the requirement of manual intervention is greatly reduced, and the understanding capability of the humanoid robot on diversified assembly tasks is improved.

By introducing an attention mechanism, the humanoid robot can pay attention to key information preferentially in the task analysis process, ignores irrelevant contents, and improves the accuracy and speed of task execution. The humanoid robot can more efficiently utilize useful information in task description when facing complex tasks, optimize information processing and realize task key recognition.

The humanoid robot reasonably schedules and executes various assembly tasks according to the priority of the tasks and the availability of resources, optimizes the task execution sequence, reduces the resource waste in the production process, and thus improves the overall working efficiency of the flexible assembly production line.

In addition, in the task execution process, the humanoid robot can dynamically adjust operation steps according to real-time feedback, adapt to the change of assembly environment and task requirements, and ensure high-efficiency operation in a complex production environment. Therefore, the flexibility and adaptability of the production line can be obviously improved, and the adaptability of task planning and execution is enhanced.

Referring to fig. 4, fig. 4 provides a flowchart of overall operation of a thought chain, and the hierarchical thought chain structure is utilized to refine complex tasks layer by layer, and perform logic verification in combination with multi-modal sensing data, so as to ensure the rationality and accuracy of a task decomposition process.

Illustratively, a natural language instruction input by an operator is received, and the natural language instruction is primarily parsed by a task input and primary parsing module to generate a primary semantic representation. And then analyzing the task layer by using the hierarchical thinking chain structure. When analyzing the task, the system combines the input multi-modal data (such as natural language description and visual information) and the context information, dynamically calculates the weight of each input through an attention mechanism, and gives higher weight to the information related to the current task node so as to optimize the task analysis result.

In particular, in the task parsing stage, the system adopts an improved hierarchical thinking chain, and the tasks are decomposed and understood layer by layer in combination with an attention mechanism. The implementation details are as follows, main task targets in the instruction are extracted through a high-level task analysis module, and corresponding high-level task nodes are generated, wherein the high-level task nodes comprise assembly targets, operation requirements and execution conditions. And then, further analyzing the high-level task nodes through the subtask recursion analysis module to generate a plurality of middle-level tasks and basic operation steps, so as to form a complete task tree structure, wherein each node in the task tree comprises a task target, an execution step and a dependency relationship thereof.

Optionally, after the task tree is generated, the system displays the result of task analysis in real time through the human-computer interaction interface, and an operator can check the decomposition step and the execution state of the task on the interface and confirm or modify the adjustment suggestion proposed by the system.

In the foregoing parsing process, the dynamic weight adjustment is performed on the key part in the instruction in combination with the attention mechanism, so that the information related to the current task node is given a higher weight, so that the task parsing process pays priority to the features related to the current task node, so as to optimize the task parsing result.

Further, in the task analysis process of some embodiments, the weighted feature vector generated by the multi-modal attention mechanism can be combined, so that the humanoid robot can process the natural language instruction, the visual information and the touch information simultaneously, and fusion analysis of multi-modal data is realized, thereby improving accuracy of task analysis and reliability of execution.

And then, carrying out logic verification on the generated task tree through a task verification and optimization module, comparing the generated task steps through an analysis domain knowledge base, eliminating unreasonable operation steps, ensuring the reasonability of task dependency relations and execution conditions, optimizing the execution plan of the task through an attention mechanism, optimizing the node arrangement in the task tree according to the task priority and resource requirements, optimizing the execution efficiency, and generating a final assembly plan.

And sending the assembly plan to a task execution module, and driving the humanoid robot to assemble according to the assembly plan.

It can be understood that in the task execution process, environmental changes in the assembly process of the humanoid robot can be monitored in real time, the focus of task analysis is readjusted by using an attention mechanism, and the task tree structure is dynamically updated according to the detected abnormal conditions.

In this embodiment, unlike the existing flexible assembly method driven by a fixed rule or a template, which is only suitable for a specific production environment and is difficult to cope with diversified production requirements, the humanoid robot has wider task understanding and executing capability through the thought chain task analysis method, can flexibly cope with different types of assembly tasks in various production scenes, and can realize intelligent task analysis and assembly execution based on the method of this embodiment, no matter in small-batch and multi-variety production environments or production lines requiring real-time adjustment of task strategies. During assembly, the attention mechanism may focus on critical steps such as the gripping and mounting of parts, thereby reducing operational errors due to external disturbances. Compared with the traditional fixed flow system, the method of the embodiment has strong adaptability, remarkably improves the application range of flexible manufacturing, can be excellent in assembly tasks with high precision requirements, and ensures the assembly quality and reliability.

In some embodiments, when a plurality of tasks to be executed need to be scheduled in a multi-task environment, an optimal task scheduling strategy is generated through a reinforcement learning algorithm according to the priority, the emergency degree and the required resources of the tasks, so that efficient utilization of humanoid robot resources is ensured. In the task scheduling process, the reinforcement learning algorithm continuously optimizes task priority ordering and resource allocation strategies through learning feedback of multitasking, so that the humanoid robot can operate in parallel in a multitasking environment, and the overall assembly efficiency is improved.

In some embodiments, in order to verify the humanoid robot flexible assembly system and method based on the thinking chain of the scheme, appropriate software and hardware support can be provided in an experimental environment to perform verification experiments.

Specifically, the hardware environment includes:

(1) The humanoid robot is provided with more than 6 mechanical arms with more than 6 degrees of freedom, is used for simulating operations such as grabbing, assembling and the like of a human, and is provided with vision and force sensors to sense the environment.

(2) And the control host is used for configuring a high-performance CPU and a GPU (such as NVIDIA RTX 4090) supporting deep learning calculation and running a task analysis model and a control algorithm.

(3) Setting an experimental workbench for storing parts and tools to be assembled and simulating the actual production environment.

The software environment includes:

(1) Operating system Linux systems (e.g., ubuntu 20.04) are used to ensure hardware compatibility and deep learning support.

(2) Development platform-model training using PyTorch or TensorFlow, robot control and sensor data processing in conjunction with the Robot Operating System (ROS).

(3) And selecting a pre-training language model (such as GPT-4) and performing fine tuning for analyzing task instructions.

Before the verification experiment starts, task instruction data and multi-modal data are prepared in advance, wherein the task instruction data comprise natural language descriptions of different assembly tasks and analysis results thereof and are used for training a model, and the multi-modal data comprise visual and force sense data of a mobile phone in the operation process of the humanoid robot and are used for enhancing the perception capability of the model in the assembly process.

After model training is completed, preparing an experimental area of 3x3 meters according to the space requirement of an experimental field, ensuring enough activity space for the operation of the humanoid robot, preparing corresponding safety measures, setting a scram button and an isolation area, and preventing safety risks caused by misoperation of the humanoid robot in the experiment.

In the experimental field, including robot work area, workstation and spare part storage area, confirm that the experimental area is clean and possess sufficient activity space after, start humanoid robot and control host computer, connect equipment such as vision sensor, force sensor to the operating condition of each equipment is inspected, the normal operating of assurance hardware.

When the working state of each device is confirmed to be normal and the hardware is confirmed to be normal, loading a pre-trained and fine-tuned large language model (such as GPT-4) and a multi-mode fusion model on a control host, starting the ROS, and ensuring smooth communication among the modules.

First, data input and task instruction issuing operations are performed, and an assembly task instruction, such as "mount part a onto component B, and perform accuracy calibration", is input by an operator through a natural language. The control host analyzes the natural language instruction through the multi-mode encoder and generates a high-dimensional characteristic representation.

Then, the input task instructions are parsed using the improved thought chain structure and attention mechanism to generate a complete task tree structure including high-level task goals and specific operational steps.

And then, the system carries out logic verification and path planning on the task according to the analyzed task tree, and generates a specific assembly plan and a robot action sequence. By utilizing the path planning function in the ROS, the motion track of the humanoid robot from the part storage area to the workbench is planned, and specific assembly action instructions such as moving to the part A position, grabbing the part A, installing the part A to the component B and the like are generated.

And finally, gradually executing each task step by the humanoid robot according to the generated assembly plan, confirming the position of the part by utilizing a visual sensor, and monitoring the stress condition in the assembly process by utilizing a force sensor. Meanwhile, during the process of executing the assembly task, the assembly state data (such as force sense data and position data) are fed back to the control host in real time. If an abnormality (such as a positional deviation or an abnormality in the assembly force) is detected during the assembly process, the system re-evaluates the task focus through the attention mechanism, adjusts the assembly action, and continues to perform the remaining steps.

In the whole experimental process, the execution time of each task step, the time consumption of path planning, the assembly error condition, sensor data and the like are recorded, the task tree structure automatically generated in the task analysis process by the system and the weight change condition adjusted by the attention mechanism are recorded, and the analysis accuracy is analyzed. If the assembly fails or the task is interrupted, the fault cause is recorded and the effect of self-adaptive adjustment is analyzed.

Through the verification experiment, the consistency of the actual completion assembly task of the humanoid robot and the expected task plan is compared, so that the accuracy of the analysis task and the execution stability are evaluated. Scheduling efficiency and assembly quality during multitasking are analyzed, including priority allocation and resource usage for each task. According to experimental data, the effect of self-adaptive adjustment capability of the system in a dynamic environment is evaluated, such as adjustment time after abnormality is detected, adjustment success rate and the like.

And stopping the operation of the humanoid robot after the experiment is finished, closing all experimental processes on the control host, and finishing and storing data files and logs generated in the experimental process. And checking and maintaining the experimental equipment to ensure that the next experiment can be normally performed.

And (3) carrying out fine adjustment and improvement on the model according to experimental results so as to improve the application effect of the system in an actual flexible manufacturing scene.

Referring to fig. 5, the embodiment of the application further provides a flexible assembly virtual system of a humanoid robot based on a thinking chain, which can implement the flexible assembly method of the humanoid robot based on the thinking chain, and the system comprises:

and the first module is used for acquiring the assembly task instruction provided by the user.

And the second module is used for carrying out task analysis on the assembly task instruction through a thinking chain model, calculating the weight of each word in the assembly task instruction through an attention mechanism in the analysis process, dynamically adjusting an analysis target related to the current task according to the weight, and decomposing the task into a plurality of subtasks layer by layer according to the analysis target to obtain a task tree structure.

And a third module for traversing the task tree structure to determine an assembly plan, wherein the assembly plan includes an execution scheduling order, a task path plan, and assembly action parameters.

It can be understood that the content in the above method embodiment is applicable to the present virtual system embodiment, and the functions specifically implemented by the present virtual system embodiment are the same as those of the above method embodiment, and the beneficial effects achieved by the above method embodiment are the same as those achieved by the above method embodiment.

The embodiment of the application also provides a flexible assembly hardware system of the humanoid robot based on the thinking chain, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the flexible assembly method of the humanoid robot based on the thinking chain when executing the computer program. The electronic equipment can be any intelligent terminal including a tablet personal computer, a vehicle-mounted computer and the like.

It can be understood that the content in the above method embodiment is applicable to the embodiment of the present hardware system, and the functions specifically implemented by the embodiment of the present hardware system are the same as those of the embodiment of the above method, and the beneficial effects achieved by the embodiment of the above method are the same as those achieved by the embodiment of the above method.

Referring to fig. 6, fig. 6 illustrates a hardware structure of a mental chain based humanoid robot flexible assembly hardware system of another embodiment, the hardware system including:

The processor 901 may be implemented by a general-purpose CPU (Central Processing Unit ), a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits, etc. for executing related programs, so as to implement the technical solutions provided by the embodiments of the present application.

The Memory 902 may be implemented in the form of a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access Memory (RandomAccessMemory, RAM). The memory 902 may store an operating system and other application programs, and when the technical solutions provided in the embodiments of the present specification are implemented by software or firmware, relevant program codes are stored in the memory 902, and the processor 901 invokes the humanoid robot flexible assembly method based on the mental chain to perform the embodiments of the present application.

An input/output interface 903 for inputting and outputting information.

The communication interface 904 is configured to implement communication interaction between the device and other devices, and may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).

A bus 905 transfers information between the various components of the device, such as the processor 901, memory 902, input/output interfaces 903, and communication interfaces 904.

Wherein the processor 901, the memory 902, the input/output interface 903 and the communication interface 904 are communicatively coupled to each other within the device via a bus 905.

The embodiment of the application also provides a computer readable storage medium which stores a computer program, and the computer program realizes the flexible assembly method of the humanoid robot based on the thinking chain when being executed by a processor.

It can be understood that the content of the above method embodiment is applicable to the present storage medium embodiment, and the functions of the present storage medium embodiment are the same as those of the above method embodiment, and the achieved beneficial effects are the same as those of the above method embodiment.

The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.

It will be appreciated by persons skilled in the art that the embodiments of the application are not limited by the illustrations, and that more or fewer steps than those shown may be included, or certain steps may be combined, or different steps may be included.

The system embodiments described above are merely illustrative, in that the units illustrated as separate components may or may not be physically separate, i.e., may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

The terms "first," "second," "third," "fourth," and the like in the description of the application and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The preferred embodiments of the present application have been described above with reference to the accompanying drawings, and are not thereby limiting the scope of the claims of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims

1. The flexible assembly method of the humanoid robot based on the thinking chain is characterized by comprising the following steps of:

Acquiring an assembly task instruction provided by a user;

2. The method of claim 1, wherein the obtaining the user-provided assembly task instruction comprises the steps of:

3. The method according to claim 1, wherein the calculating the weight of each word in the assembly task instruction by the attention mechanism dynamically adjusts the resolution target related to the current task according to the weight, comprising the steps of:

normalizing the attention score to obtain the weight of the word;

4. The method of claim 1, wherein said traversing the task tree structure determines an assembly plan comprising the steps of:

5. The method according to claim 4, wherein the logic verification is performed on the task tree structure according to the order and the dependency relationship of the subtasks, and the subtask arrangement in the task tree structure is optimized based on an attention mechanism, so as to obtain the execution arrangement order, and the method comprises the following steps:

6. The method according to claim 1, wherein the thought-chain-based humanoid robot flexible assembly method further comprises the steps of:

7. The method according to claim 6, wherein the mental chain based humanoid robot flexible assembling method further comprises the steps of:

And displaying the assembly flow result through an interactive interface.

8. A humanoid robot flexible assembly virtual system based on a thought chain, the virtual system comprising:

9. A humanoid robot flexible assembly hardware system based on a mental chain, characterized in that it comprises a memory storing a computer program and a processor implementing the method of any of claims 1 to 7 when executing the computer program.

10. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method of any one of claims 1 to 7.