CN119668875A

CN119668875A - Distributed computing network task scheduling method, device and equipment

Info

Publication number: CN119668875A
Application number: CN202411801246.9A
Authority: CN
Inventors: 谢坤; 黄小红; 李丹丹; 张沛; 赵文瑞
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2024-12-09
Filing date: 2024-12-09
Publication date: 2025-03-21

Abstract

The invention provides a distributed computing power network task scheduling method, device and equipment, wherein the distributed computing power network comprises a user side edge router and a computing power side edge router, the user side edge router acquires target computing power requirements, determines the current computing power state corresponding to the distributed computing power network according to a multi-protocol border gateway protocol, inputs the target computing power requirements and the current computing power state into a first agent reinforcement learning model corresponding to the user side edge router, outputs an initial node address corresponding to the target computing power requirements through the first agent reinforcement learning model, sends the initial node address to the computing power side edge router, judges whether the initial node address needs to be updated or not, determines a target node address according to a judging result, and unloads the target computing power requirements to target nodes corresponding to the target node address.

Description

Distributed computing power network task scheduling method, device and equipment

Technical Field

The disclosure relates to the field of data processing, and in particular relates to a distributed computing power network task scheduling method, device and equipment.

Background

With the increasing demand of distributed computing, particularly in complex computing environments with multiple users and multiple terminals, the conventional task scheduling method presents obvious limitations when facing the problems of dynamic change of resources, uneven node load, contention of computing resources, and the like.

The existing scheduling scheme is difficult to effectively adapt to high requirements of different scenes on computing resource allocation and task execution efficiency, and particularly when coordination among computing nodes is poor and scheduling delay is high, problems such as resource waste and task execution failure are easy to occur frequently.

Disclosure of Invention

Accordingly, an objective of the present disclosure is to provide a method, an apparatus and a device for task scheduling in a distributed computing power network, which are used for solving or partially solving the above-mentioned problems.

Based on the above object, a first aspect of the present disclosure provides a task scheduling method for a distributed computing power network, where the distributed computing power network includes a user side edge router and a computing power side edge router, and the method includes:

The user side edge router obtains a target computing power demand, and determines a current computing power state corresponding to the distributed computing power network according to a multi-protocol border gateway protocol, wherein the current computing power state represents a task allocation state of each computing power node in the current distributed computing power network;

The user side edge router inputs the target calculation force requirement and the current calculation force state to a first intelligent agent reinforcement learning model corresponding to the user side edge router, and an initial node address corresponding to the target calculation force requirement is output through the first intelligent agent reinforcement learning model;

The user side edge router sends the initial node address to the power side edge router, and the power side edge router judges whether the initial node address needs to be updated or not and determines a target node address according to a judging result;

And the computing force side edge router unloads the target computing force requirement to a target node corresponding to the target node address.

Based on the same inventive concept, a second aspect of the present disclosure proposes a distributed computing power network task scheduling device, including:

The data acquisition module is configured to acquire a target computing power demand by the user side edge router, and determine a current computing power state corresponding to the distributed computing power network according to a multi-protocol border gateway protocol, wherein the current computing power state represents a task allocation state of each computing power node in the current distributed computing power network;

The initial node address determining module is configured to input the target computing power requirement and the current computing power state to a first intelligent agent reinforcement learning model corresponding to the user side edge router by the user side edge router, and output an initial node address corresponding to the target computing power requirement through the first intelligent agent reinforcement learning model;

The target node address determining module is configured to send an initial node address to the power computing side edge router by the user side edge router, and the power computing side edge router judges whether the initial node address needs to be updated or not and determines the target node address according to a judging result;

and the unloading module is configured to unload the target computing force requirement to a target node corresponding to the target node address by the computing force side edge router.

Based on the same inventive concept, a third aspect of the present disclosure proposes an electronic device comprising a memory, a processor and a computer program stored on the memory and executable by the processor, the processor implementing the distributed computing power network task scheduling method as described above when executing the computer program.

Based on the same inventive concept, a fourth aspect of the present disclosure proposes a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the distributed computing power network task scheduling method as described above.

From the foregoing, it can be seen that the present disclosure proposes a method, an apparatus and a device for task scheduling of a distributed computing power network, where a user side edge router obtains a target computing power requirement, and determines a current computing power state corresponding to the distributed computing power network according to a multiprotocol border gateway protocol, where the current computing power state represents a task allocation state of each computing power node in the current distributed computing power network, so that an agent corresponding to a subsequent user side edge router makes a decision, and determines a suitable computing power resource node. The user side edge router inputs the target calculation force requirement and the current calculation force state into a first intelligent agent reinforcement learning model corresponding to the user side edge router, and outputs an initial node address corresponding to the target calculation force requirement through the first intelligent agent reinforcement learning model, wherein a node corresponding to the initial node address is a node matched with the current calculation force state and the target calculation force requirement of the user. The user side edge router sends the initial node address to the power side edge router, the power side edge router judges whether the initial node address needs to be updated, and a target node address is determined according to a judging result, namely, a power unloading decision given by a user side model is probably not applicable after a plurality of hops are passed, so that a model deployed on the power side edge router can be redirected for demand unloading before the power demand is finally unloaded. The force calculation side edge router unloads the target force calculation requirement to a target node corresponding to the target node address, so that the finally determined target node address is more accurate, and efficient task scheduling in a distributed environment is realized through the user side edge router and the force calculation side edge router agent, and the requirements of different scenes on the efficiency and the reliability of task execution are met.

Drawings

In order to more clearly illustrate the technical solutions of the present disclosure or related art, the drawings required for the embodiments or related art description will be briefly described below, and it is apparent that the drawings in the following description are only embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort to those of ordinary skill in the art.

FIG. 1 is a flow chart of a distributed computing power network task scheduling method according to an embodiment of the present disclosure;

FIG. 2 is a block diagram of a distributed computing power network task scheduler according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

For the purposes of promoting an understanding of the principles and advantages of the disclosure, reference will now be made to the embodiments illustrated in the drawings and specific language will be used to describe the same.

It should be noted that unless otherwise defined, technical or scientific terms used in the embodiments of the present disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which the present disclosure pertains. The terms "first," "second," and the like, as used in embodiments of the present disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.

The terms referred to in this disclosure are explained as follows:

MP-BGP the Multi-protocol border gateway protocol (MP-BGP) is an extension to traditional BGP, supporting the exchange of routing information for multiple protocol families (e.g., IPv4, IPv 6). It is capable of simultaneously delivering routing information of multiple protocols, such as unicast, multicast, MPLS VPN, etc., in the same BGP session by introducing an Address Family Identifier (AFI) and a Subsequent Address Family Identifier (SAFI). MP-BGP provides flexible route control and management in a multiprotocol environment, is widely applied to IPv6 migration, MPLS virtual private networks and complex multiprotocol networks, and realizes efficient and flexible route information distribution and path selection.

MARL Multi-agent reinforcement learning (Multi-Agent Reinforcement Learning, MARL) is a technique that applies reinforcement learning in a Multi-agent system, aiming at letting multiple agents learn and decide together in the same environment to solve collaborative or competing tasks. Each agent may obtain information by interacting with the environment and other agents and update its policies based on the feedback obtained. MARL is often used in complex distributed task scheduling, autopilot, robot collaboration, etc., and relates to a scenario where multiple agents work cooperatively or compete with each other. MADDPG as one of multi-agent reinforcement learning, each agent has its own policy network (Actor) for making decisions based on current observations. But the agent's cost function (Critic) is centralized and has access to other agents' observation and action information. This architecture allows an agent to utilize global information to evaluate the value of its actions when making local decisions, thereby improving learning efficiency.

MADDPG Multi-agent depth deterministic strategy Gradient (MADDPG) algorithm is a reinforcement learning algorithm used in Multi-agent environments. This algorithm is an extension of the depth deterministic strategy gradient (DDPG) based algorithm. MADDPG are mainly used to solve collaboration and competition problems in multi-agent environments, especially where interactions between agents can be very complex.

OSPF, open Shortest Path first protocol (Open short PATH FIRST, OSPF), which is a link state routing protocol for Internet Protocol (IP) networks.

In the process of evolution from centralized to distributed, the computing power network still faces the challenges of uneven resource distribution, low computing power utilization efficiency, lack of effective scheduling of computing power and the like. In particular, in the aspect of AI large model training, the decentralised distributed computing power network has a better chance of landing on model reasoning, and can predict that future increment space is also large enough. As the development of the distributed computing technology is still in the research stage, a series of problems such as communication delay, data privacy, resource scheduling and the like are faced.

Based on the above description, this embodiment proposes a task scheduling method of a distributed computing power network, as shown in fig. 1, where the distributed computing power network includes a user side edge router and a computing power side edge router, and the method includes:

Step 101, a user side edge router obtains a target computing power demand, and determines a current computing power state corresponding to a distributed computing power network according to a multi-protocol border gateway protocol, wherein the current computing power state represents a task allocation state of each computing power node in the current distributed computing power network.

In particular, the distributed computing power network is a novel computing model, which converts a traditional centralized computing architecture into distributed computing, and a huge computing network is formed by connecting computing nodes. The method is mainly characterized by comprising the distribution and sharing of resources, dynamic and efficient task scheduling and parallel and distributed processing capacity, so that the resource utilization rate and the task execution efficiency are improved. In addition, the method has strong fault tolerance and high availability, avoids single-point faults through task migration and redundancy design, and ensures high efficiency of communication between nodes by relying on low-delay and high-bandwidth network connection.

The user side edge router obtains the target computing power requirement of a user, and determines the current computing power state corresponding to the distributed computing power network according to the multi-protocol border gateway protocol, wherein the current computing power state represents the task allocation state of each computing power node in the current distributed computing power network.

The meaning of the fields of the MP-BGP protocol is described below:

The MP-BGP extended community attribute is an 8-byte path attribute for adding additional routing information in a BGP Update message, and consists of four parts, namely a 1-byte Type field (Type) for identifying attribute category, a 1-byte subtype field (Sub-Type) for further thinning Type usage, and a 6-byte value attribute for carrying specific numerical value or policy information. This structure enables BGP to flexibly extend functionality, carrying specific service requirement information.

The Type field is typically made up of two parts, the upper 4 bits representing the Type and the lower 4 bits representing the subtype or other characteristic of the attribute. For carrying user power demand and power resource information, the Type field is set to 0x03 (Opaque Extended Community) for custom purposes.

For the power demand bearer, the Sub-Type field is set to 0x01. For the computing power resource information, the Sub-Type field is set to 0x02 (Route Target).

For the 6 byte data field, the upper three bits represent the CPU requirement of the user and the third bit represents the storage requirement of the user when bearing the power requirement, and the CPU and the stored unit charging standard are combined into one value when bearing the power resource information because the power resource information is a 7-tuple. The 6 bytes represent six tuples of the computational resource, one byte for each value.

In step 1022, the user side edge router inputs the target computing power requirement and the current computing power state to the first agent reinforcement learning model corresponding to the user side edge router, and outputs the initial node address corresponding to the target computing power requirement via the first agent reinforcement learning model.

In specific implementation, a first intelligent agent reinforcement learning model is deployed in the user side edge router, the user side edge router inputs the target calculation power requirement and the current calculation power state to the first intelligent agent reinforcement learning model corresponding to the user side edge router, the decision is made by using the first intelligent agent reinforcement learning model, and the initial node address corresponding to the target calculation power requirement is output.

And step 103, the user side edge router sends the initial node address to the power computing side edge router, and the power computing side edge router judges whether the initial node address needs to be updated or not and determines a target node address according to a judging result.

In specific implementation, the user side edge router sends the initial node address obtained by decision to the computing force side edge router, and the computing force side edge router is also deployed with an agent reinforcement learning model. The power state may change during the time that the data packet originates from the user side edge router to reach the power side edge router to which the power resource node has access. Therefore, the force-calculation side edge router judges whether the initial node address needs to be updated or not, and determines the target node address according to the judging result.

And step 104, unloading the target computing force demand to a target node corresponding to the target node address by the computing force side edge router.

In the specific implementation, after the force calculation side edge router determines the target node address, the target force calculation requirement of the user can be unloaded to the target node corresponding to the target node address, so that intelligent collaborative scheduling among the force calculation nodes is realized.

Through the scheme, the user side edge router acquires the target computing power demand, and determines the current computing power state corresponding to the distributed computing power network according to the multi-protocol border gateway protocol, wherein the current computing power state represents the task allocation state of each computing power node in the current distributed computing power network, so that the following agent corresponding to the user side edge router can make decisions, and the appropriate computing power resource node is determined. The user side edge router inputs the target calculation force requirement and the current calculation force state into a first intelligent agent reinforcement learning model corresponding to the user side edge router, and outputs an initial node address corresponding to the target calculation force requirement through the first intelligent agent reinforcement learning model, wherein a node corresponding to the initial node address is a node matched with the current calculation force state and the target calculation force requirement of the user. The user side edge router sends the initial node address to the power side edge router, the power side edge router judges whether the initial node address needs to be updated, and a target node address is determined according to a judging result, namely, a power unloading decision given by a user side model is probably not applicable after a plurality of hops are passed, so that a model deployed on the power side edge router can be redirected for demand unloading before the power demand is finally unloaded. The force calculation side edge router unloads the target force calculation requirement to a target node corresponding to the target node address, so that the finally determined target node address is more accurate, and efficient task scheduling in a distributed environment is realized through the user side edge router and the force calculation side edge router agent, and the requirements of different scenes on the efficiency and the reliability of task execution are met.

In some embodiments, under the scenario of multiple users-multiple terminals, a resource conflict may occur, that is, different user computing power demands are scheduled to the same computing power resource node, and the time when the computing power demands reach the computing power resource node is consecutive, so that the scheduling decision given by the user side router agent is not applicable any more.

In some embodiments, the distributed computing power network further includes a backbone router, and step 103 specifically includes:

Step 1031, determining a target message path by using the open shortest path first protocol and the initial node address.

Step 1032, determining whether there is a backbone router in at least one intermediate node included in the target packet path, where the intermediate node is other nodes except the initial target node corresponding to the initial node address and the current node where the user side edge router is located.

In response to the existence of the backbone router, determining a new initial node address from the target computing power demand using the backbone router, the backbone router transmitting the new initial node address to the computing power side edge router, step 1033.

Or alternatively

In response to the absence of the backbone router, the customer side edge router sends the initial node address to the power side edge router, step 1034.

In specific implementation, the user side edge router determines a target message path according to its own routing table by using an OSPF protocol and an initial node address, wherein the target message path is the shortest path between a current node and the initial node address.

And determining all intermediate nodes on the target message path, wherein the intermediate nodes are other nodes except the initial target node corresponding to the initial node address and the current node where the user side edge router is located. And judging whether a backbone router is deployed in the intermediate node or not to determine whether the initial node address needs to be updated.

If the backbone router exists, the backbone router is utilized to make a decision according to the target calculation power requirement, a new initial node address is determined, and the specific new initial node address is determined in the following manner:

and acquiring a computing power state corresponding to the distributed computing power network based on the message queue, and taking the computing power state as a first computing power state. And inputting the target calculation force requirement and the first calculation force state into a second agent reinforcement learning model corresponding to the backbone router, making a decision through the second agent reinforcement learning model, and outputting a new initial node address corresponding to the target calculation force requirement.

In this embodiment, the distributed power network performs power information synchronization through the redis-streams message queue, and the power resource node serves as a producer of the message, monitors the state of the power resource itself, and writes the power information into the streams periodically. The router for deploying the intelligent agent is taken as a consumer of the message, and the latest computing power resource information is acquired at regular time and transmitted to the intelligent agent through subscribing the message theme.

If the backbone router does not exist, the initial node address is directly sent to the power side edge router.

By the above scheme, the computing power state may change during the process that the data packet is sent from the user side edge router to the computing power side edge router accessed by the computing power resource node through the backbone router in the network. Therefore, when the network router passes through the backbone router, the second intelligent agent reinforcement learning model deployed on the backbone router is utilized to redetermine the new initial node address, the new initial node address is more matched with the calculation power state of each node in the current distributed calculation power network, and the obtained new initial node address is more accurate.

In some embodiments, backbone routers are defined as routers with degrees greater than 2. And the router with the degree equal to 2 only can transmit data packets from one port to the other port, and does not have the condition of intelligent scheduling.

In some embodiments, step 103 specifically includes:

Step 1031, the computing power side edge router determines a computing power resource corresponding to the initial node address.

Step 1032, determining whether the computing power resource meets the target computing power requirement.

Step 1033, in response to the computing power resource meeting the target computing power requirement, determining that updating of the initial node address is not required, and taking the initial node address as a target node address.

Or alternatively

Step 1034, in response to the computing power resource not meeting the target computing power requirement, determining that an initial node address needs to be updated, updating the initial node address by a computing power side edge router until the computing power resource meets the target computing power requirement, obtaining an updated node address, and taking the updated node address as a target node address.

In specific implementation, the computing power side edge router determines computing power resources of the edge nodes corresponding to the initial node addresses, and judges whether the computing power resources can meet target computing power requirements of users.

If the computing power resource is enough to meet the target computing power requirement, the edge node can meet the user requirement, and the initial node address is not required to be updated at the moment, and is taken as the target node address.

If the computing power resource does not meet the target computing power requirement, the edge node can not meet the user requirement, and the initial node address is determined to be required to be updated. The method for updating the initial node address by adopting the computing force side edge router comprises the following steps:

and the computing force side edge router acquires the computing force state corresponding to the distributed computing force network at the current moment based on the message queue, and takes the computing force state as a second computing force state.

The calculation force side edge router is provided with a third agent reinforcement learning model, the target calculation force requirement and the second calculation force state are input into the third agent reinforcement learning model, and a new node address is output through the third agent reinforcement learning model.

The computing power side edge router corresponding to the new node address judges whether the computing power resource of the router is enough to meet the target computing power requirement. And if the target calculation force demand cannot be met, determining a new node address according to the calculation force side edge router corresponding to the new node address again until the calculation force resource of the node corresponding to the new node address meets the target calculation force demand, and taking the new node address as the updated node address.

By the scheme, the calculation force unloading decision given by the user side model is possibly not applicable after a plurality of hops, so that the model deployed on the calculation force side edge router can redirect the demand unloading before the final unloading of the calculation force demand. And a third agent reinforcement learning model deployed on the computing force side edge router is utilized to redetermine a new initial node address, so that the new initial node address is more matched with the computing force state of each node in the current distributed computing force network, and the obtained new initial node address is more accurate.

In some embodiments, after step 1033, further comprising:

Step A, counting the updating times of the initial node address;

And B, responding to the updating times being larger than a preset updating times threshold value, wherein the computing power resource of the node corresponding to the new node address does not meet the target computing power requirement, and taking the node address obtained after the last updating as the updated node address.

When the data packet arrives at the computing power side edge router, in the process of rescheduling resources, the intelligent agent deployed by the computing power side edge router can generate a scene that the data packet is always spread in the network because the computing power demand is not met by all computing power resource node resources.

And counting the updating times of the initial node address, wherein when the updating times are larger than a preset updating times threshold, the computing power resource of the node corresponding to the new node address does not meet the target computing power requirement, and at the moment, the new node address is not determined again, and the node address obtained after the last updating is directly used as the updated node address.

Through the scheme, when the number of updating the initial node address reaches a certain limit, the updating is stopped, the continuous updating is stopped, and the problems of network congestion and the like caused by the fact that data packets are always transmitted in a network are avoided.

The embodiment provides a distributed computing power network task scheduling method. According to the scheme, a multi-agent reinforcement learning model is deployed on a router to make a decision, and a proper computing power resource node is allocated to computing power requirements of an access user to carry out demand unloading. Before the task scheduling process, the network distributes computing power information through MP-BGP protocol, so as to ensure that each router has global computing power information. In the process of computing task scheduling, routing in the network is realized through a default OSPF protocol, and a reinforcement learning model is only deployed on the edge router at the user side, the backbone router of the network and the edge router at the computing side. In view of the fact that the state of the computing power resources is continuously changed in the multi-user access scene, the computing power unloading decision given by the user side model is probably not applicable after a plurality of hops are passed, and therefore the models deployed on the network backbone router and the computing power side edge router can be used for redirecting the demand unloading before the final unloading of the computing power demand. When the number of the re-direction of the power unloading on the power side edge router reaches a certain threshold, the edge router does not re-schedule the power tasks, but selects to add the power tasks to the power node task queue allocated for the last time for waiting, so that the situation that the power demand is frequently scheduled all the time due to insufficient resources is avoided.

In some embodiments, step 102 specifically includes:

Step 1021, the user side edge router obtains the resource node information of the distributed computing power network, and determines the available computing resources and the available storage resources corresponding to each resource node according to the resource node information;

Step 1022, inputting the target calculation power demand and the current calculation power state to a first agent reinforcement learning model corresponding to a user side edge router, wherein the first agent reinforcement learning model comprises constraint conditions and an optimization target, the constraint conditions are target calculation amount and target storage tasks corresponding to the target calculation power demand, and the optimization target is a target rewarding function maximization of the first agent reinforcement learning model;

Step 1023, calculating the first agent reinforcement learning model to obtain an initial node address corresponding to the target calculation force demand.

In the implementation, a user side edge router acquires resource node information of a distributed computing power network, and determines available computing resources and available storage resources corresponding to each resource node according to the resource node information.

Illustratively, M users are included in a distributed computing network, each user u e {1,2,3,..m } having several computing tasks. The user's computational task model may be described as a collectionEach calculation task t _u,k is represented as a binary group { CPU _u,k,Storage_u,k }, and represents a FLOP (floating point operation number) of the task and a storage task in GB.

There are N computing resource nodes in the distributed computing network at the same time, each computing resource node j e {1,2,3,..n } consists of one or two servers. The resource node composed of two kinds of resource nodes has both computing and storage capabilities, and the resource node composed of one kind of server has only one kind of capability.

For a compute server, its triples are { CPU _j,Cost_cpu,j,Load_cpu,j }, where CPU _j is the total computing power of the server, in FLOPS, cost _cpu,j is the Cost of use, and Load _cpu,j is the server Load case. For a Storage server, the three tuples are { Storage _j,Cost_storage,j,Load_storage,j,Band_j }, where Storage _j is the Storage capacity of the server, the unit is GB, cost _storage,j is the use Cost, the usage is charged according to the unit, load _storage,j is the Load condition of the server, and Band _j is the Storage bandwidth of the server, the unit is GB/s.

Thus, each computing resource node is represented as:

R_j＝{CPU_j,Cost_cpu,j,Load_cpu,j,Storage_j,Cost_storage,j,Load_storage,j,Band_j}.

Determining available computing resources and available storage resources corresponding to each resource node according to the resource node information, wherein the available computing resources and the available storage resources are expressed as follows by using a formula:

Wherein, In order to be able to use the computing resources,Is an available storage resource.

Inputting the target calculation force demand and the current calculation force state into a first intelligent agent reinforcement learning model corresponding to a user side edge router, wherein the first intelligent agent reinforcement learning model comprises constraint conditions and an optimization target, the constraint conditions are target calculation amount and target storage tasks corresponding to the target calculation force demand, the optimization target is the target reward function maximization of the first intelligent agent reinforcement learning model, and the first intelligent agent reinforcement learning model is calculated to obtain an initial node address corresponding to the target calculation force demand.

Based on the above example, the constraint conditions of the first agent reinforcement learning model specifically include:

Determining a target calculation amount and a target storage task corresponding to a target calculation force demand, wherein the constraint condition is expressed as follows by using a formula:

Wherein each of the power tasks t _u,k in the target power demand is represented as a binary group { CPU _u,k,Storage_u,k},CPU_u,k is the target calculation amount, storage _u,k is the target Storage task, In order to be able to use the computing resources,Is an available storage resource.

The optimization objective of the first agent reinforcement learning model specifically comprises:

In collaborative multi-agent MADDPG, the state of each agent contains both local information and partial global information for efficient scheduling of multi-user tasks. The local state s _i,local of agent i contains the current computation task t _u,k to be scheduled, and its own routing table, which contains the next-hop node list and reachability information. The global state s _i,global of an agent includes all the information of the global computational resource node.

Determining the load of each resource node in the distributed computing power network, and determining the global load unbalance degree of the intelligent agent according to the loads of all the resource nodes, wherein the global load unbalance degree is expressed as follows by using a formula:

Wherein B is global Load imbalance, N is the total number of resource nodes in the distributed computing network, j is each resource node in the distributed computing network, load _cpu,j is the server Load of the computing servers in the distributed computing network, lood _sttorage,j is the server Load of the storage servers in the distributed computing network, An average load of all resource nodes determined according to the loads of all resource nodes.

Thus state space S _i＝s_i,local∪s_i,global for agent i. The action of each agent is to select to schedule the computational effort task to a certain computational effort resource node, and the action space A epsilon { a ₁,a₂,a₃,...,a_i }, wherein a _i =j represents that the agent i schedules the computational effort task t _u,k to be executed on the computational effort resource node j. The collaborative rewarding function R _cooperative is used for measuring whether the allocation decision of the intelligent agent optimizes global resource utilization, task cost, time delay and load balance, and determining the collaborative rewarding function of the first intelligent agent reinforcement learning model according to the global load unbalance degree, wherein the collaborative rewarding function is expressed as follows:

Where R _cooperative is the collaborative rewards function, M is the total number of users in the distributed computing power network, For the resource utilization of the computing power task t _u,k at the computing power resource node j,To the cost of use of the computing task t _u,k to complete the task at the computing resource node j,And the time delay of the task is represented, alpha, beta and gamma are respectively preset coefficients, delta is a penalty weight of unbalance degree, and delta is used for balancing local rewards and global balance.

The saidAndThe specific determination process is expressed as follows:

The said The time delay of the task is represented, wherein the time delay comprises network propagation delay (measured based on hop count) from an agent to a computing resource node, the queuing delay of the task at the computing resource node and the processing delay of the computing resource node, and the specific determining process is represented by using a formula:

Wherein, For the number of hops from agent i to target node j,To account for the queuing delay of the force demand uk in the agent j queue,

In collaboration MADDPG, centralized Critic network Q _cooperative (S, a) is included to evaluate the joint Q value of all agent actions using global state set S and action set a of all agents for guiding policy optimization.

Determining a target reward function of the first agent reinforcement learning model according to a collaborative reward function, wherein the target reward function is expressed as follows by a formula:

Wherein, For the target rewarding function, pi _i(a_i|s_i) for the distributed Actor network, pi _i(a_i|s_i) for the local state and part of the global information updating policy, S for the global state set, a for the action set of the agent, Q _cooperative (S, a) for the centralized Critic network, gamma Q _cooperative(S′,a′|θ^Q) for the return of the action value function in the next state S 'and action a' taken by the agent in that state, theta ^Q for the parameters of the Q network, which represent the parameters of the neural network model of the action value function Q (S, a),As parameters of the policy network, it represents parameters of the policy network (i.e. the motion selection mechanism) of the agent, S ' is the state, a ' is the motion selected by the agent in the state S ', and E is the mathematical expectation.

Through the scheme, the collaboration is mainly realized through the global rewarding function R _cooperative, the centralized Critic network and global information sharing. The office rewarding function comprises global target items such as resource utilization rate, task cost, time delay, load unbalance degree and the like, and guides the intelligent agent to make optimization contribution to the whole system in aspect of behavior selection. Under CTDE (centralized training and distributed execution) framework, centralized Critic can access the states and behaviors of all agents in the training process, evaluate the contribution of joint behaviors to global rewards, and promote agents to cooperatively optimize in policy updating. Meanwhile, the intelligent agents can share global information such as a load state and the like during execution, so that a cooperation effect is formed in local decisions.

In this embodiment, by employing multi-agent reinforcement learning, multiple agents are deployed on different nodes in the network, each of which is capable of independent learning and decision-making. Compared with a centralized scheduling method, the distributed agent can make a rapid scheduling decision based on local information, and the response speed and task scheduling efficiency of the system are greatly improved. Through reinforcement learning, the intelligent agent can optimize the decision strategy in continuous interaction and feedback, gradually learn how to better distribute tasks and avoid resource conflict. The learning mechanism allows the intelligent body to have self-adaption when facing dynamically-changing network environment, and can rapidly cope with fluctuation of network flow and change of computational resources.

Meanwhile, in the distributed power computing network, power computing resources are scattered on different physical positions, and the traditional centralized scheduling method is easy to become a bottleneck of a system when the task request quantity is increased, so that single-point faults and scheduling delays are caused. By dispersing the scheduling tasks to each intelligent agent node, the multi-intelligent agent reinforcement learning not only avoids overload of the central node, but also improves the expandability and fault tolerance of the system. Even if some nodes fail, other intelligent agents can still continue task scheduling according to the current network state, so that continuous operation of the system is ensured. The invention ensures the synchronization of the calculation power information in the distributed calculation power network scene through a message queue mechanism. On the other hand, the scheduling method based on reinforcement learning can also be dynamically adjusted according to real-time feedback in the network. Each intelligent agent can adjust the scheduling strategy according to own observation and global rewarding function, so that tasks are reasonably distributed among different nodes, and resource overload or idle is avoided. And the decision-making ability of the intelligent agent is gradually improved along with the time, and the global optimization is finally realized. The adaptive scheduling can not only improve the utilization rate of network resources, but also remarkably improve the efficiency and quality of task execution.

The distributed computing power network task scheduling method provided by the embodiment is suitable for complex environments of multiple users and multiple computing power nodes. For example, in a large-scale cloud computing or edge computing network, the diversity and complexity of tasks place higher demands on the schedule. Through MARL, customized scheduling strategies can be created for different types of tasks and resources, so that requirements of users on the aspects of task time delay, bandwidth, computing capacity and the like are met. The method has the advantages of high adaptability, distributed architecture, high resource utilization rate and strong expansibility, and provides a very promising solution for task scheduling in a large-scale distributed computing environment.

It should be noted that the method of the embodiments of the present disclosure may be performed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the methods of embodiments of the present disclosure, the devices interacting with each other to accomplish the methods.

It should be noted that the foregoing describes some embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

Based on the same inventive concept, the present disclosure also provides a distributed computing power network task scheduling device corresponding to the method of any embodiment.

Referring to fig. 2, fig. 2 is a distributed computing power network task scheduling device according to an embodiment, including:

A data acquisition module 201, configured to acquire a target computing power requirement by a user side edge router, and determine a current computing power state corresponding to a distributed computing power network according to a multiprotocol border gateway protocol, wherein the current computing power state represents a task allocation state of each computing power node in the current distributed computing power network;

An initial node address determining module 202 configured to input the target computing power requirement and the current computing power state to a first agent reinforcement learning model corresponding to the user side edge router by the user side edge router, and output an initial node address corresponding to the target computing power requirement via the first agent reinforcement learning model;

The target node address determining module 203 is configured to send an initial node address to the power-calculating side edge router by the user side edge router, and the power-calculating side edge router determines whether the initial node address needs to be updated or not, and determines the target node address according to a determination result;

An offloading module 204 configured to offload the target computing force demand to a target node corresponding to the target node address by a computing force side edge router.

In some embodiments, the distributed computing power network further includes a backbone router, and the destination node address determining module 203 specifically includes:

a message path determining unit configured to determine a target message path by using an open shortest path first protocol and the initial node address;

the judging unit is configured to judge whether at least one intermediate node contained in the target message path exists or not, wherein the intermediate node is other nodes except for an initial target node corresponding to an initial node address and a current node where a user side edge router is located;

An updating unit configured to determine a new initial node address from the target computing power demand using the backbone router in response to the existence of the backbone router, the backbone router transmitting the new initial node address to the computing power side edge router, or

And a transmitting unit configured to transmit the initial node address to the power side edge router in response to the absence of the backbone router.

In some embodiments, the updating unit is specifically configured to:

acquiring a power calculation state corresponding to a distributed power calculation network based on a message queue, and taking the power calculation state as a first power calculation state;

And inputting the target calculation force requirement and the first calculation force state into a second agent reinforcement learning model corresponding to the backbone router, and outputting a new initial node address corresponding to the target calculation force requirement through the second agent reinforcement learning model.

In some embodiments, the destination node address determining module 203 specifically includes:

the resource determining unit is configured to determine the computing power resource corresponding to the initial node address by the computing power side edge router;

a judging unit configured to judge whether the computing power resource satisfies the target computing power demand;

a first target node address determination unit configured to determine that an update of an initial node address is not required in response to the computing power resource satisfying the target computing power demand, taking the initial node address as a target node address, or

And the second target node address determining unit is configured to determine that an initial node address needs to be updated in response to the computing power resource not meeting the target computing power demand, and update the initial node address by a computing power side edge router until the computing power resource meets the target computing power demand, obtain an updated node address, and take the updated node address as a target node address.

In some embodiments, the second target node address determining unit is specifically configured to:

the power calculation side edge router obtains a power calculation state corresponding to the distributed power calculation network at the current moment based on the message queue, and the power calculation state is used as a second power calculation state;

Inputting the target calculation force requirement and the second calculation force state into a third agent reinforcement learning model corresponding to a calculation force side edge router, and outputting a new node address through the third agent reinforcement learning model;

And taking the new node address as the updated node address until the computational power resource of the node corresponding to the new node address meets the target computational power demand.

In some embodiments, the initial node address determination module 202 is specifically configured to:

The method comprises the steps that a user side edge router obtains resource node information of a distributed computing power network, and available computing resources and available storage resources corresponding to each resource node are determined according to the resource node information;

Inputting the target calculation force demand and the current calculation force state into a first intelligent agent reinforcement learning model corresponding to a user side edge router, wherein the first intelligent agent reinforcement learning model comprises constraint conditions and an optimization target, the constraint conditions are target calculation amount and target storage tasks corresponding to the target calculation force demand, and the optimization target is the maximization of a target rewarding function of the first intelligent agent reinforcement learning model;

and calculating the first agent reinforcement learning model to obtain an initial node address corresponding to the target calculation force demand.

In some embodiments, the constraint condition of the first agent reinforcement learning model specifically includes:

Wherein each of the power tasks t _u,k in the target power demand is represented as a binary group { CPU _u,k,Storage_u,k},CPU_u,k is the target calculation amount, storage _u,k is the target Storage task, In order to be able to use the computing resources,Is an available storage resource;

Wherein B is global Load imbalance, N is the total number of resource nodes in the distributed computing network, j is each resource node in the distributed computing network, load _cpu,j is the server Load of a computing server in the distributed computing network, load _storage,j is the server Load of a storage server in the distributed computing network, An average load of all resource nodes determined according to the loads of all resource nodes;

determining a collaborative rewarding function of a first agent reinforcement learning model according to the global load unbalance degree, wherein the collaborative rewarding function is expressed as follows by a formula:

Where R _cooperative is the collaborative rewards function, M is the total number of users in the distributed computing power network, For the resource utilization of the computing power task t _u,k at the computing power resource node j,To the cost of use of the computing task t _u,k to complete the task at the computing resource node j,The time delay of the task is represented, alpha, beta and gamma are respectively preset coefficients, and delta is penalty weight of unbalance degree;

Wherein, Pi _i(a_i|s_i) is a distributed Actor network, S is a global state set, a is an action set of an agent, Q _cooperative (S, a) is a centralized Critic network, γq _cooperative(S′,a′|θ^Q) is a return of an action value function in a next state S 'and an action a' taken by the agent in the state, θ ^Q is a parameter of Q network, which represents a parameter of a neural network model of the action value function Q (S, a),As parameters of the policy network, it represents parameters of the policy network (i.e. the motion selection mechanism) of the agent, S ' is the state, a ' is the motion selected by the agent in the state S ', and E is the mathematical expectation.

For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, the functions of the various modules may be implemented in the same one or more pieces of software and/or hardware when implementing the present disclosure.

The device of the foregoing embodiment is configured to implement the corresponding distributed computing power network task scheduling method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which is not described herein again.

Based on the same inventive concept, the present disclosure also provides an electronic device corresponding to the method of any embodiment, including a memory, a processor, and a computer program stored on the memory and capable of running on the processor, where the processor implements the distributed computing power network task scheduling method of any embodiment when executing the program.

Fig. 3 shows a more specific hardware architecture of an electronic device provided by the present embodiment, which may include a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 implement communication connections therebetween within the device via a bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit ), a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage, dynamic storage, etc. Memory 1020 may store an operating system and other application programs, and when the embodiments of the present specification are implemented in software or firmware, the associated program code is stored in memory 1020 and executed by processor 1010.

The input/output interface 1030 is used to connect with an input/output module for inputting and outputting information. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.

Communication interface 1040 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).

Bus 1050 includes a path for transferring information between components of the device (e.g., processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).

It should be noted that although the above-described device only shows processor 1010, memory 1020, input/output interface 1030, communication interface 1040, and bus 1050, in an implementation, the device may include other components necessary to achieve proper operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.

The electronic device of the foregoing embodiment is configured to implement the corresponding distributed computing power network task scheduling method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which is not described herein.

Based on the same inventive concept, corresponding to any of the above embodiments of the method, the present disclosure further provides a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the distributed computing power network task scheduling method according to any of the above embodiments.

The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.

The storage medium of the foregoing embodiments stores computer instructions for causing the computer to execute the distributed computing power network task scheduling method according to any one of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein.

It will be appreciated that before using the technical solutions of the various embodiments in the disclosure, the user may be informed of the type of personal information involved, the range of use, the use scenario, etc. in an appropriate manner, and obtain the authorization of the user.

For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Therefore, the user can select whether to provide personal information to the software or hardware such as the electronic equipment, the application program, the server or the storage medium for executing the operation of the technical scheme according to the prompt information.

As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.

It will be appreciated that the above-described notification and user authorization process is merely illustrative, and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.

It will be appreciated by persons skilled in the art that the foregoing discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure, including the claims, is limited to these examples, that the steps may be implemented in any order and that many other variations of the different aspects of the disclosed embodiments described above are present, which are not provided in detail for the sake of brevity, and that the features of the above embodiments or of the different embodiments may also be combined within the spirit of the disclosure.

Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure the embodiments of the present disclosure. Furthermore, the devices may be shown in block diagram form in order to avoid obscuring the embodiments of the present disclosure, and this also accounts for the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform on which the embodiments of the present disclosure are to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.

While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.

The disclosed embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Accordingly, any omissions, modifications, equivalents, improvements, and the like, which are within the spirit and principles of the embodiments of the disclosure, are intended to be included within the scope of the disclosure.

Claims

1. A distributed computing network task scheduling method, characterized in that the distributed computing network includes a user-side edge router and a computing-side edge router, including:

The user-side edge router obtains the target computing power demand and determines the current computing power state corresponding to the distributed computing power network according to the multi-protocol border gateway protocol, wherein the current computing power state represents the task allocation state of each computing power node in the current distributed computing power network;

The user-side edge router inputs the target computing power requirement and the current computing power state into a first agent reinforcement learning model corresponding to the user-side edge router, and outputs an initial node address corresponding to the target computing power requirement via the first agent reinforcement learning model;

The user-side edge router sends the initial node address to the computing power-side edge router, and the computing power-side edge router determines whether the initial node address needs to be updated, and determines the target node address according to the determination result;

The computing power side edge router offloads the target computing power demand to the target node corresponding to the target node address.

2. The method according to claim 1, characterized in that the distributed computing network also includes a backbone router,

The user-side edge router sends the initial node address to the computing power-side edge router, including:

Determine the target message path by using the open shortest path first protocol and the initial node address;

Determine whether there is a backbone router at least one intermediate node included in the target message path, wherein the intermediate node is a node other than the initial target node corresponding to the initial node address and the current node where the user-side edge router is located;

In response to the existence of a backbone router, a new initial node address is determined according to the target computing power requirement by using the backbone router, and the backbone router sends the new initial node address to the computing power side edge router; or,

In response to the absence of a backbone router, the user-side edge router sends the initial node address to the computing power-side edge router.

3. The method according to claim 2, characterized in that the use of a backbone router to determine a new initial node address according to the target computing power requirement comprises:

Based on the message queue, the computing power state corresponding to the distributed computing power network at the current time is obtained as the first computing power state;

The target computing power requirement and the first computing power state are input into a second agent reinforcement learning model corresponding to the backbone router, and a new initial node address corresponding to the target computing power requirement is output via the second agent reinforcement learning model.

4. The method according to claim 1 is characterized in that the computing power side edge router determines whether the initial node address needs to be updated, and determines the target node address according to the judgment result, including:

The edge router on the computing power side determines the computing power resources corresponding to the initial node address;

Determine whether the computing power resources meet the target computing power requirements;

In response to the computing power resource satisfying the target computing power requirement, determining that the initial node address does not need to be updated, and using the initial node address as the target node address; or,

In response to the computing power resources not meeting the target computing power requirements, it is determined that the initial node address needs to be updated, and the computing power side edge router updates the initial node address until the computing power resources meet the target computing power requirements, obtains an updated node address, and uses the updated node address as the target node address.

5. The method according to claim 4, characterized in that the computing power side edge router updates the initial node address until the computing power resources meet the target computing power requirements, and obtains the updated node address, comprising:

The edge router on the computing power side obtains the computing power state corresponding to the distributed computing power network at the current moment based on the message queue as the second computing power state;

Input the target computing power requirement and the second computing power state into a third agent reinforcement learning model corresponding to the computing power side edge router, and output a new node address through the third agent reinforcement learning model;

Until the computing power resources of the node corresponding to the new node address meet the target computing power requirement, the new node address is used as the updated node address.

6. The method according to claim 5, characterized in that after outputting a new node address via the third agent reinforcement learning model, it also includes:

Counting the number of updates of the initial node address;

In response to the number of updates being greater than a preset update number threshold, and the computing power resources of the node corresponding to the new node address do not meet the target computing power requirement, the node address obtained after the last update is used as the updated node address.

7. The method according to claim 1, characterized in that the user-side edge router inputs the target computing power requirement and the current computing power state into a first agent reinforcement learning model corresponding to the user-side edge router, and outputs an initial node address corresponding to the target computing power requirement via the first agent reinforcement learning model, comprising:

The user-side edge router obtains resource node information of the distributed computing network, and determines available computing resources and available storage resources corresponding to each resource node according to the resource node information;

Input the target computing power requirement and the current computing power state into a first agent reinforcement learning model corresponding to the user-side edge router, wherein the first agent reinforcement learning model includes constraints and optimization objectives, the constraints are the target computing amount and the target storage task corresponding to the target computing power requirement, and the optimization objective is to maximize the target reward function of the first agent reinforcement learning model;

The first agent reinforcement learning model is calculated to obtain an initial node address corresponding to the target computing power requirement.

8. The method according to claim 7, characterized in that the constraints of the first agent reinforcement learning model specifically include:

Determine the target computing amount and target storage task corresponding to the target computing power requirement. The constraint condition is expressed by the formula:

Each computing task t _u,k in the target computing demand is represented as a tuple {CPU _u,k ,Storage _u,k }, where CPU _u,k is the target computing capacity and Storage _u,k is the target storage task. is the available computing resources, For available storage resources;

The optimization objectives of the first agent reinforcement learning model specifically include:

Determine the load of each resource node in the distributed computing network, and determine the global load imbalance of the agent according to the load of all resource nodes, wherein the global load imbalance is expressed by the formula:

Where B is the global load imbalance, N is the total number of resource nodes in the distributed computing network, j is each resource node in the distributed computing network, Load _cpu,j is the server load of the computing server in the distributed computing network, Load _storage,j is the server load of the storage server in the distributed computing network, is the average load of all resource nodes determined according to the load of all resource nodes;

The collaborative reward function of the first agent reinforcement learning model is determined according to the global load imbalance, and the collaborative reward function is expressed by the formula:

Among them, R _cooperative is the cooperative reward function, M is the total number of users in the distributed computing network, is the resource utilization rate of computing task t _u,k on computing resource node j, is the cost of computing task t _u,k to complete the task on computing resource node j, represents the time delay of the task, α, β, γ are preset coefficients, and δ is the penalty weight of imbalance;

The target reward function of the first agent reinforcement learning model is determined according to the collaborative reward function. The target reward function is expressed by the formula:

Q _cooperative (S,a|θ ^Q )=E[R _cooperative +γQ _cooperative (S′,a′|θ ^Q )]

in, is the target reward function, π _i (a _i |s _i ) is the distributed Actor network, S is the global state set, a is the action set of the agent, Q _cooperative (S, a) is the centralized Critic network, γQ _cooperative (S′, a′|θ ^Q ) is the reward of the action value function in the next state S′ and the action a′ taken by the agent in this state, θ ^Q is the parameter of the Q network, which represents the parameters of the neural network model of the action value function Q(s, a), is the parameter of the policy network, which represents the parameter of the agent's policy network (i.e., action selection mechanism), S′ is the state, a′ is the action selected by the agent in state S′, and E is the mathematical expectation.

9. A distributed computing power network task scheduling device, characterized by comprising:

The data acquisition module is configured to obtain the target computing power demand for the user-side edge router, and determine the current computing power state corresponding to the distributed computing power network according to the multi-protocol border gateway protocol, wherein the current computing power state represents the task allocation state of each computing power node in the current distributed computing power network;

An initial node address determination module is configured so that the user-side edge router inputs the target computing power requirement and the current computing power state into a first agent reinforcement learning model corresponding to the user-side edge router, and outputs an initial node address corresponding to the target computing power requirement via the first agent reinforcement learning model;

The target node address determination module is configured so that the user-side edge router sends the initial node address to the computing power-side edge router, and the computing power-side edge router determines whether the initial node address needs to be updated, and determines the target node address according to the determination result;

The unloading module is configured as a computing power side edge router to unload the target computing power demand to the target node corresponding to the target node address.

10. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method according to any one of claims 1 to 8 when executing the program.