Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
In some of the flows described in the specification and claims of this application and in the above-described figures, a number of operations are included that occur in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel as they occur herein, the number of operations, e.g., 101, 102, etc., merely being used to distinguish between various operations, and the number itself does not represent any order of performance. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.
The technical scheme of the embodiment of the application is mainly applied to a distributed job processing system and used for realizing resource scheduling management in the job processing system.
The job refers to a task submitted to the computer processing system to request the computer system to perform corresponding work, for example, in an actual application, the job may refer to a MySQL query task.
Fig. 1 is a schematic diagram of a job processing system in the prior art, and as can be seen from fig. 1, the job processing system may include a resource manager 101, a job manager 102, a job node 103, and the like; the resource manager 101 is deployed in a physical machine for execution, the job manager 102 and the job node 103 are both deployed in a machine node 104, the machine node 104 is configured to provide physical resources, such as CPUs, memories, and the like, the machine node 104 includes a plurality of nodes, and one or more job nodes may be deployed in one machine node 104. The resource manager 101, the job manager 102, and the job node 103 may specifically refer to processes running in a physical device.
The resource manager 101 receives a job to be processed submitted by a user, may create a job manager 102 in a machine node 104 where available resources exist, and submit the job to be processed to the job manager 102 for processing, the job manager 102 may determine, based on the job to be processed, the number of job nodes that need to be executed, may determine the required number of resource copies, each resource may run one job node, and one resource may include, for example, a 2-core CPU and a 2GB memory, and therefore, the job manager 102 may apply the resource manager 101 for the required number of resource copies; the resource manager 101 performs resource allocation based on available resources of different machine nodes, and feeds back a resource allocation result (how many machine resources are allocated in different machine nodes, etc.) to the job manager 102, so as to determine the number of resource allocation copies in one machine node; job manager 102 creates job node 103 at the respective machine node based on the resource allocation result to execute the job.
Because many jobs often exist in practical application, as the cluster scale and the number of user requests increase, the processing performance of the resource manager is limited, thereby affecting the job processing, and causing cluster resource waste because the cluster resources are not effectively utilized.
In order to ensure normal processing of jobs and improve the utilization rate of cluster resources, an inventor provides the technical scheme of the application through a series of researches.
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
An embodiment of the present application provides a resource control system, as shown in fig. 2, the resource control system may include a plurality of control nodes 201, a plurality of scheduling nodes 202 connected to each control node 201, and a requesting node 203 connected to each scheduling node 202; wherein, each machine node is connected with one control node 203, and each control node 203 is connected with at least one machine node.
The request node 203 is configured to receive a job to be processed submitted by a user, and request the first scheduling node 2021 to perform scheduling processing on the job to be processed;
the first scheduling node 2021 is configured to create a job manager for the job to be processed; determining a first machine node with available resources, and allocating machine resources in the first machine node for the job to be processed; sending the resource allocation result of the first machine node to the corresponding first control node 2011; receiving an allocation success message fed back by the first control node 2011, and sending an allocation success notification to the job manager, so that the job manager creates a job section in the first machine node;
the first control node 2011 is configured to receive a resource allocation result of the first machine node; judging whether the first machine node allows resource allocation; if the resource allocation is allowed, an allocation success message is fed back to the first scheduling node 2021.
The first scheduling node may receive a resource allocation application of the job manager; the resource allocation application can include the number of resource application copies and the number of resources contained in each resource; the first scheduling node may determine the number of required resource allocation copies in each machine node based on the number of resource application copies, and obtain a resource allocation result of each machine node.
Thus, optionally, the allocating, by the first scheduling node, machine resources in the first machine node for the job to be processed may include:
determining the number of resource allocation copies in the first machine node; and generating a resource allocation result of the first machine node based on the resource allocation number.
The first machine node may be any machine node having available resources, and the resource allocation result may include the number of allocated resources.
Furthermore, after the first control node determines that resource allocation is allowed in the first machine node based on the resource allocation result, the resource allocation result may be passed through the first machine node so that the first machine node is ready for the number of resource allocation copies in the resource allocation result.
Because each scheduling node knows the remaining resources of all machine nodes, and selects available resources from all machine nodes when performing resource allocation, it is possible that multiple scheduling nodes will perform resource allocation on the same machine node based on the received jobs, and in order to ensure successful resource allocation, each scheduling node needs to give the control node arbitration for the resource allocation result of each machine node. Therefore, the first control node receives the resource allocation result of the first machine node, and determines whether the first machine node allows resource allocation.
Optionally, the determining, by the first control node, whether the first machine node allows resource allocation may include:
and judging whether the submission time of the job to be processed is the earliest according to resource allocation results of different jobs aiming at the first machine node, which are sent by different scheduling nodes, and if so, determining that resource allocation is allowed. Otherwise, the machine resources of the first machine node are allocated to the job with the earliest commit time.
In this embodiment, the resource control system provided is responsible for performing resource scheduling processing in the job processing system, the control node is responsible for performing interactive control with the machine node, the scheduling node is responsible for performing interactive control with the job manager, avoiding using a single resource manager to perform resource scheduling processing, resulting in insufficient performance and affecting job processing, so that cluster resources cannot be utilized.
The resource control system according to the embodiment of the present application is applied to a job processing system and is responsible for performing resource scheduling processing, and therefore, the job processing system provided by the embodiment of the present application may be configured by the resource control system shown in fig. 2, the job manager 20, and the job node 30, as shown in fig. 3.
The resource control system is composed of a plurality of control nodes 101, a plurality of scheduling nodes 102 connected to each control node 101, a requesting node 103 connected to each scheduling node 102, and the like; wherein, each machine node 10 is connected with one control node 103, and each control node 103 is connected with at least one machine node 10.
As described in fig. 2, after receiving the job to be processed submitted by the user, the requesting node 20 requests the first scheduling node 2021 to perform scheduling processing on the job to be processed; the first scheduling node 2021 creates a job manager 20 for the job to be processed; determining that there is a first machine node 11 of available resources, and allocating machine resources in the first machine node 11 for the job to be processed.
After determining that the first machine node 11 allows allocation of resources, the job manager 20 may create a job node 30 in the first machine node 11.
Of course, job manager 20 may also create one or more job nodes in other machine nodes based on the results of the first scheduling node's resource allocation to other machine nodes.
In some embodiments, as shown in fig. 2 and as shown in fig. 3, a coordinating node 204 may also be included in the resource control system;
the coordinating node 204 is configured to obtain a load condition of each machine node reported by the multiple scheduling nodes 202; notifying the plurality of scheduling nodes 202 whether to allow resource allocation to any machine node according to a load condition in the any machine node;
the first scheduling node 2021 determining that there is a first machine node of available resources comprises determining that there is available resources and that the first machine node is in an allowed allocation state.
That is, each control node can report the load condition of each machine node connected with the control node to each scheduling node, and then each scheduling node synchronizes the load condition of each machine node to the coordinating node.
In some embodiments, the coordinating node may be further configured to determine at least one machine node according to the scheduling request of the first scheduling node, and notify other scheduling nodes not including the first scheduling node to suspend resource allocation to the at least one machine node.
The scheduling request may include a machine identification of the at least one machine node, and the like.
For example, when the pending job is important, the first scheduling node needs to preferentially allocate the machine resources on some machine nodes, so the first scheduling node may initiate a scheduling request to the coordinating node to request that the machine resources be preferentially allocated on some machine nodes.
Optionally, the coordinating node may be further configured to receive a user viewing request, and output a load condition of the machine node requesting viewing.
In some embodiments, the requesting node requesting the first scheduling node to schedule the job to be processed includes:
determining a first scheduling node with load pressure meeting scheduling conditions according to the load conditions of different scheduling nodes; and requesting the first scheduling node to perform scheduling processing on the job to be processed.
The scheduling condition may refer to, for example, that the load pressure is less than a preset pressure value.
In some embodiments, the first scheduling node is further configured to generate a resource recovery result according to the number of resource recovery copies for the job to be processed in the first machine node; sending the resource recovery result to the first control node;
the first control node is further configured to request the first machine node to release a corresponding resource according to the resource recovery result, and feed back a recovery success message to the first scheduling node;
the first scheduling node is further configured to receive a recovery success message fed back by the first control node, and send a recovery success notification to the job manager, so that the job manager ends a job task executed by a job node running in a recovery resource.
In the process of operating a job node by a machine node, part or all of machine resources need to be recovered possibly due to machine faults or other reasons, and at this time, a resource recovery result can be generated according to the number of resource recovery copies of the job to be processed in the first machine node; the resource recycling result may include the recycling number of the resource.
After receiving the recovery success notification, the job manager may reinitiate the resource allocation application based on the job task executed by the job node where the resource is recovered to run.
In some embodiments, the first scheduling node is further configured to receive a resource return request sent by the job manager for the first machine node; generating a resource returning result according to the resource returning number in the resource returning request; sending the resource return result to the first control node;
and the first control node is further used for requesting the first machine node to release corresponding resources according to the resource returning result of the first machine node, and feeding back a returning success message to the first scheduling node.
The resource returning request may be generated when the job manager detects that the execution of the job task in at least one job node is finished.
When the first scheduling node allocates resources for the job to be processed, if the currently available machine resources are insufficient, only part of the machine resources may be allocated so as to execute part of job tasks and the like by running part of the job nodes, and the machine resources of each machine node are detected, and if the machine nodes with available resources exist, the resources are allocated to run other job nodes executing the job to be processed.
In the embodiment of the application, the first machine node, which is determined by the first scheduling node to have available resources, may be obtained by detection immediately after receiving a resource allocation application of the job manager, or obtained by detection in a process of executing a job to be processed; or detected and obtained after receiving a resource allocation application reinitiated by the job manager.
As can be seen from the above description, the resource allocation result of the first machine node is sent to the first control node, and there may be a resource allocation result of the first machine node that is previously allocated to the job to be processed and has not been processed in the first control node, and at this time, all resource allocation results for the first machine node may be accumulated and then processed comprehensively.
Thus, in some embodiments, the first scheduling node sending the resource allocation result of the first machine node to the first control node comprises: updating a resource allocation record corresponding to the job to be processed in the first machine node based on the resource allocation result of the first machine node; sending the resource allocation record to the first control node;
the first control node is specifically configured to determine whether resource allocation is allowed according to the resource allocation record; if the resource allocation is allowed, feeding back an allocation success message to the first scheduling node;
the first scheduling node receives the allocation success message of the first control node, and the sending of the allocation success notification to the job manager includes: receiving an allocation success message of the first control node, updating an allocated record of the job to be processed aiming at the first machine node based on the successfully allocated resource number, and clearing the successfully allocated resource number in the resource allocation record; based on the allocated record, sending an allocation success notification to the job manager, and clearing the allocated record. The allocation success notification may include the allocated resource amount in the allocated record. The allocation success message may include the number of successfully allocated resource shares.
That is, the first scheduling node may record, for the job to be processed, the resource scheduling conditions in different machine nodes, including the resource allocation record, where the number of resource copies to be allocated for confirmation to be sent to the first control node may be recorded in the resource allocation record. If there is a resource allocation record corresponding to the job to be processed in the first machine node, the resource allocation record may be updated based on a resource allocation result of the first machine node, where the resource allocation result includes the number of resource allocation copies, and it is assumed that 1 machine resource exists, and the job to be processed records 1 machine resource in the resource allocation record corresponding to the first machine node, and after the update, the resource allocation record records 2 machine resources. Of course, if there is no resource allocation record corresponding to the job to be processed in the first machine node, the resource allocation record corresponding to the job to be processed in the first machine node may be generated based on the resource allocation result of the first machine node.
The resource allocation record may be stored in a resource allocation field, which is denoted by the string "toCommitAssign" in the following section.
The first control node sends, that is, specifically, the resource allocation record, so that the first control node determines whether to allow resource allocation based on the number of resource shares in the resource allocation record.
The first scheduling node receives the successful allocation message of the first control node, and may determine that the resource allocation is successful, at this time, the number of successfully allocated resource parts in the resource allocation record may be cleared, for example, 5 resource parts are recorded in the resource allocation record, and if the allocation is successful, the resource allocation record becomes 1 resource part. The successfully allocated resource share in the "toCommitAssign" is also transferred to the allocated record.
The first scheduling node may receive a plurality of allocation success messages corresponding to the first machine node for the job to be processed, and some of the allocation success messages may not notify the job manager yet, so that the first scheduling node records the resource scheduling condition of the job to be processed in different machine nodes, and may further include an allocated record, which is recorded in the allocated record, of the allocated number of resource copies confirmed to be allocated by the first control node. Therefore, the first scheduling node may first update the allocated record after receiving the allocation success message, and of course, if there is no corresponding allocated record of the to-be-processed job in the first machine node, the corresponding allocated record of the to-be-processed job in the first machine node may be generated based on the resource allocation record of the first machine node.
Where the allocated records may be saved in allocated fields, the following section may represent allocated fields with the string "toasign".
The first scheduling node may send an allocation success notification to the job manager based on the allocated record, and clear the allocated record, that is, clear "toasign", after sending the allocation success notification.
In addition, in some embodiments, the first scheduling node records the resource scheduling condition in different machine nodes for the job to be processed, and may further include a resource recycling record.
The first scheduling node sending the resource reclamation result of the first machine node to the first control node may include:
updating a resource recovery record corresponding to the job to be processed in a first machine node based on a resource recovery result of the first machine node; sending the resource recovery record to the first control node;
the first control node is specifically configured to request the first machine node to release a corresponding resource according to the resource recovery record, and feed back a recovery success message to the first scheduling node;
the first scheduling node receiving the recovery success message and sending a recovery success notification to the job manager includes: receiving a recovery success message of the first control node, updating a corresponding recovered record of the job to be processed in the first machine node based on the successfully recovered resource number, and clearing the successfully recovered resource number from the resource recovery record; based on the reclaimed record, sending a reclamation success notification to the job manager, and clearing the reclaimed record. Wherein, the allocation success notification may include the number of reclaimed resource copies in the reclaimed record. The recycling success message includes the number of successfully recycled resource shares.
Of course, if there is no resource recycling record corresponding to the job to be processed in the first machine node, the resource recycling record corresponding to the job to be processed in the first machine node may be generated based on the resource recycling result of the first machine node.
The resource recovery result includes the number of resource recovery copies, so the resource recovery record also records the number of resource copies to be confirmed and recovered, which are to be sent to the first control node.
The resource recycling record may be guaranteed to be in the resource recycling field, and the following part may indicate the resource recycling field by a character string "tocommitterevake".
The first control node sends, that is, specifically, the resource recycling record, so that the first control node requests the first machine node to release the corresponding machine resource based on the number of resource copies in the resource recycling record.
The first scheduling node receives the recovery success message of the first control node, and may consider that the resource recovery is successful, at this time, the successfully recovered resource shares in the resource recovery record may be cleared, that is, the successfully recovered resource shares in the tocommitterevoke field are transferred to the allocated record.
The first scheduling node may receive a plurality of recovery success messages corresponding to the first machine node for the job to be processed, and some of the recovery success messages may not be notified to the job manager, so that the first scheduling node records the resource scheduling condition of the job to be processed in different machine nodes, and may further include a recovered record, which is the number of recovered resources that the first control node has confirmed to recover. Therefore, the first scheduling node may first update the recycled record after receiving the recycling success message, and of course, if there is no corresponding recycled record of the job to be processed in the first machine node, the corresponding recycled record of the job to be processed in the first machine node may be generated based on the resource recycling record of the first machine node.
Where the reclaimed record may be saved in an allocated field, the following section may represent the reclaimed field with the string "torerelease".
The first scheduling node may send a recovery success notification to the job manager based on the recovered record, and clear the recovered record after sending the recovery success notification, that is, clear the "torerelease" field.
In addition, in some embodiments, the first scheduling node records the resource scheduling condition in different machine nodes for the job to be processed, and may further include a resource return record.
The sending, by the first scheduling node, the resource return result to the first control node may include: updating a resource returning record corresponding to the job to be processed in the first machine node based on the resource returning result of the first machine node; sending the resource return record to the first control node;
the first control node is specifically configured to request the first machine node to release the corresponding resource according to the resource returning record, and feed back a returning success message to the first scheduling node;
and the first scheduling node is further configured to receive the returning success message and clear the number of successfully returned resource shares in the resource returning record. The successful return message may include the number of successful resource shares.
The resource returning record can record the number of the resource copies to be confirmed to be returned, which are ready to be sent to the first control node. If the resource return record corresponding to the job to be processed in the first machine node exists, of course, if the resource return record corresponding to the job to be processed in the first machine node does not exist, the resource return record corresponding to the job to be processed in the first machine node may be generated based on the resource return result of the first machine node.
The resource return record may be stored in a resource return field, and the following part may represent the resource return field by a character string "tocommitterturn".
The resource return record is sent by the first control node, so that the first control node requests the first machine node to release the corresponding resource based on the number of the resource copies in the resource return record.
The first scheduling node receives the returning success message of the first control node, can determine that the resource number of copies recorded in the resource returning record has been successfully returned, and at this time, can clear the successfully returned resource number of copies in the resource returning record. Since the machine resource requested to be returned by the job manager is always successfully returned, the resource return condition of the job manager does not need to be notified.
In addition, since the first scheduling node may generate a resource allocation result, a resource recycling result, and a resource returning result, in some embodiments, the first scheduling node may further be configured to update the resource scheduling record of the job to be processed for the first machine node based on the resource allocation result, the resource recycling result, and the resource returning result.
The resource scheduling record may record the total number of allocated copies of the resource of the job to be processed in the first machine node.
The resource allocation result is accumulated into the resource scheduling record, and the resource recovery record and the resource return result are cleared from the resource scheduling record in corresponding parts.
The resource scheduling record may be stored in a resource scheduling field, and the following part may represent the resource return field with a character string "RunningQueue".
As can be seen from the above description, the first scheduling node may record the following fields for the resource scheduling condition of the job to be processed in the first machine node:
resource scheduling field: the RunningQueue represents to save a resource scheduling record, namely the total resource allocation number of the job to be processed in the first machine node;
resource allocation field: the toCommitAssign represents that the toCommitAssign is used for saving resource allocation records, namely the number of the resource parts to be confirmed and allocated to be sent to the first control node;
a resource recovery field: the tocommitterevoke represents that the tocommitter is used for saving resource recovery records, namely the number of the resource parts to be confirmed and recovered, which are to be sent to the first control node;
resource return field: the tocommitterturn represents that the tocommitter return record is used for storing the resource return record, namely the number of the resource parts to be confirmed and returned to be sent to the first control node;
an allocated field: toasign represents to save the allocated record, that is, the allocated resource number of the first control node that has confirmed the allocation;
a recovered field: the torerelease indicates that the recovered record is saved, that is, the number of the recovered resources that the first control node has confirmed to recover.
If the first scheduling node finds that the toCommitAssign is not empty, the first scheduling node sends a resource allocation record in the toCommitAssign to the first machine node; if the tocommitteRevoke is found not to be empty, sending a resource recovery record in the tocommitteRevoke to the first machine node; if the tocommitterturn is found not to be empty, the resource return record in the tocommitterturn is sent to the first machine node.
If the first scheduling node receives the successful allocation message, transferring the successfully allocated resource number in the toCommitAssign to the tocassign; if receiving the recovery success message, transferring the successfully recovered resource parts in the toCommitRevoke to toRelease;
if finding that the toasign is not empty, the first scheduling node sends a distribution success notification to the operation manager; if toRelease is found, a recovery success notification is sent to the job manager.
As can be seen from the above-described interaction process during resource allocation, resource recovery, and resource return, it is possible that resource allocation, resource recovery, and resource return may occur simultaneously in the first machine node for the job to be processed, and operations such as sending a record to the first control node, receiving an arbitration message from the first control node, and sending a notification to the job manager during resource allocation, resource recovery, and resource return may occur simultaneously, and some operations may intersect with each other, so that a collision event is likely to occur, and the number of interactions may be increased, which may result in resource waste. In order to reduce the collision event and avoid resource waste under the condition of ensuring accurate resource scheduling, several possible situations will be described below.
In one possible implementation:
the first scheduling node sending the resource allocation record to the first control node may comprise:
judging whether a resource recovery record of the job to be processed aiming at the first machine node exists or not;
if the resource recovery record is not sent to the first scheduling node and the number of resource recovery parts in the resource recovery record is larger than the number of resource allocation parts in the resource allocation record, clearing the resource allocation record and updating the resource recovery record based on the number of resource allocation parts; if the resource recovery record is not sent to the first scheduling node and the number of resource recovery parts in the resource recovery record is smaller than the number of resource allocation parts in the resource allocation record, clearing the resource allocation record, updating the resource allocation record based on the number of resource recovery parts, and sending the updated resource allocation record to the first control node; if the resource recovery record is not sent to the first scheduling node and the number of resource recovery parts in the resource recovery record is equal to the number of resource allocation parts in the resource allocation record, clearing the resource allocation record and the resource recovery record;
and if not, sending the resource allocation record to the first control node.
That is, the first scheduling node may first update the RunningQueue and the toCommitAssign, and prepare to send the resource allocation record in the toCommitAssign to the first control node, if it is found that tocommitancerevoke is not empty and is not sent to the first scheduling node, for the first scheduling node, if the number of resource recovery shares in tocommitancerevoke is greater than the number of resource allocation shares in toCommitAssign, the toCommitAssign may be cleared, and since toCommitAssign is empty, it is not necessary to send the resource allocation record to the first control node, that is, it is not necessary to perform allocation. And updating the tocommitRevoke based on the resource allocation number, namely subtracting the resource allocation number from the original resource recovery number in the tocommitRevoke for updating.
If the number of resource recovery copies is less than the number of resource allocation copies, the tocommittermevoke can be cleared, and since the tocommittermevoke is empty, a resource recovery record does not need to be sent to the first control node, that is, recovery is not needed. And updating the toCommitAssign based on the resource recovery number, namely subtracting the resource recovery number from the original resource allocation data in the toCommitAssign to update.
If the number of resource recovery shares is equal to the number of resource allocation shares, tocommitasson and tocommitake can be cleared, recovery and allocation are mutually offset, and resource allocation and resource recovery are not needed because resource allocation records and resource recovery records are not available, and resource allocation and resource recovery are mutually offset, and no notification is sent to the job manager.
If the resource recovery record of the job to be processed for the first machine node does not exist, the resource allocation record can be directly sent to the first control node.
In another possible implementation scenario:
the first scheduling node is further configured to send the resource allocation record to the first control node if the resource recovery record exists and the resource recovery record has been sent to the first control node.
That is, the first scheduling node may first update the RunningQueue and the toCommitAssign, and prepare to send the resource allocation record in the toCommitAssign to the first control node, if it is found that tocommitterrevoke is not empty and has been sent to the first scheduling node, for the first scheduling node, it cannot be determined whether the first control node has processed the resource recovery record, and only the first control node may be considered to have processed the resource recovery record, but a recovery success message of the first control node is not received, and at this time, the resource allocation record in the toCommitAssign needs to be sent to the first control node for allocation scheduling.
In yet another possible implementation scenario:
the sending an allocation success notification to the job manager based on the allocated record, and clearing the allocated record may include:
judging whether a recovered record of the to-be-processed operation for the first machine node exists or not;
if so, if the number of allocated resources in the allocated record is equal to the number of recycled resources in the recycled record, clearing the allocated record and the recycled record; if the number of allocated resources in the allocated record is greater than the number of recycled resources in the recycled record, clearing the recycled record, updating the allocated record based on the number of recycled resources, sending an allocation success notification to the job manager based on the updated allocated record, and clearing the allocated record; if the allocated resource number in the allocated record is smaller than the recycled resource number in the recycled record, clearing the allocated record, and updating the recycled record based on the allocated resource number;
if not, based on the allocated record, sending an allocation success notice to the job manager, and clearing the allocated record. That is, the first scheduling node may first update the RunningQueue and the toCommitAssign, and prepare to send the resource allocation record in the toCommitAssign to the first control node, if it is found that toRelease is not null, it indicates that the first control node has processed the resource recovery record sent by the first scheduling node and the recovery is successful, but the first scheduling node has not sent a recovery success notification to the job manager. At this time, the resource allocation record in the toCommitAssign may be sent to the first control node, and if the allocation success message of the first control node is received, the toCommitAssign may be updated based on the resource allocation record.
At this time, if the number of allocated resources in toasign is the same as the number of recycled resources in toRelease, toasign and toRelease can be cancelled out without sending any notification to the job manager; if the number of the distributed resources is larger than the number of the recycled resources, clearing the recycled records, updating the distributed records based on the number of the recycled resources, namely subtracting the number of the recycled resources from the number of the originally distributed resources in the distributed records to update, and sending a successful distribution notice to the operation manager based on the updated distributed records; and if the allocated resource number is less than the recovered resource number, clearing the allocated record, and updating the recovered record based on the allocated resource number, namely, subtracting the allocated resource number from the original recovered resource number in the recovered record for updating.
If torerelease is empty, the resource allocation record may be directly sent to the first control node, and after the allocated record is obtained, an allocation success notification is sent to the job manager based on the allocated record, and the allocated record is cleared.
In yet another possible implementation scenario:
the sending, by the first scheduling node, the resource reclamation record to the first control node may include:
judging whether a resource allocation record of the job to be processed for the first machine node exists or not;
if the resource allocation record is not sent to the first control node and the number of the resource allocation parts in the resource allocation record is larger than the number of the resource recovery parts in the resource recovery record, clearing the resource recovery record and updating the resource allocation record based on the number of the resource recovery parts; if the resource allocation record is not sent to the first control node and the number of resource allocation copies in the resource allocation record is smaller than the number of resource recovery copies in the resource recovery record, clearing the resource allocation record, updating the resource recovery record based on the number of resource recovery copies, and sending the updated resource recovery record to the first control node; if the resource allocation record is not sent to the first control node and the number of resource allocation copies in the resource allocation record is equal to the number of resource recovery copies in the resource recovery record, clearing the resource allocation record and the resource recovery record;
and if not, sending the resource recovery record to the first control node.
That is, the first scheduling node may first update the RunningQueue and the tocommitterevoke, and when preparing to send the resource recovery record in the tocommitterevoke to the first control node, if it is found that the tocommitteksign is not empty and is not sent to the first scheduling node, for the first scheduling node, if the number of resource recovery shares in the tocommitterme is equal to the number of resource allocation shares in the tocommitteksign, the tocommitteksign and the tocommitterme may be cleared, and the two things of recovery and allocation cancel each other.
If the number of resource recovery shares is greater than the number of resource allocation shares, the toCommitAssign may be cleared, and since the toCommitAssign is empty, it is not necessary to send a resource allocation record to the first control node, that is, it may not be necessary to perform allocation. Updating the tocommitRevoke based on the resource allocation number, namely, subtracting the resource allocation number from the original resource recovery number in the tocommitRevoke for updating, and sending an updated resource recovery record to the first control node;
if the number of resource recovery copies is less than the number of resource allocation copies, the tocommittermevoke can be cleared, and since the tocommittermevoke is empty, a resource recovery record does not need to be sent to the first control node, that is, recovery is not needed. And updating the toCommitAssign based on the resource recovery number, namely subtracting the resource recovery number from the original resource distribution number in the toCommitAssign to update.
If the resource allocation record of the job to be processed for the first machine node does not exist, the resource recovery record can be directly sent to the first control node.
In yet another possible implementation scenario:
the first scheduling node may further be configured to send the resource recovery record to the first control node if a resource allocation record corresponding to the job to be processed at the first machine node exists and the resource allocation record has been sent to the first control node.
That is, if it is found that the toCommitAssign is not empty and has been sent to the first scheduling node, for the first scheduling node, it cannot be determined whether the first control node has processed the resource allocation record, but only the first control node has processed the resource allocation record and has not received the allocation success message of the first control node, and at this time, the resource recovery record in the tocommitancerevoke needs to be sent to the first control node for allocation scheduling.
In yet another possible implementation scenario:
the first scheduling node sending a recovery success notification to the job manager based on the recovered record, and clearing the recovered record includes:
judging whether an allocated record of the job to be processed aiming at the first machine node exists;
if so, if the number of the recycled resource copies in the recycled record is larger than the number of the allocated resource copies in the allocated record, clearing the allocated record and updating the recycled record based on the allocated resource copies, and sending a recycling success notification to the job manager based on the recycled record after updating and clearing the recycled record; if the number of the recycled resource copies in the recycled record is smaller than the number of the allocated resource copies in the allocated record, clearing the recycled record and updating the allocated record based on the number of the recycled resource copies; if the number of the recycled resource copies in the recycled record is equal to the number of the allocated resource copies in the allocated record, clearing the recycled record and the allocated record;
if not, based on the recovered record, a recovery success notice is sent to the operation manager, and the recovered record is cleared.
That is, the first scheduling node may first update the RunningQueue and the tocommitterevoke, and prepare to send the resource recovery record in the tocommitterevoke to the first control node, if the toAssign is found to be not empty, it indicates that the first control node has processed the resource allocation record once and the first scheduling node obtains the allocation success message, but the first scheduling node has not sent the allocation success notification to the job manager. At this time, the resource recycling record in tocommitterevake may be sent to the first control node, and if a recycling success message of the first control node is received, torerelease may be updated based on the resource recycling record.
At this time, if the number of allocated resources in toasign is the same as the number of recycled resources in toRelease, toasign and toRelease can be cancelled out without sending any notification to the job manager; if the number of allocated resources is greater than the number of recovered resources, clearing the recovered record, and updating the allocated record based on the number of recovered resources, that is, subtracting the number of recovered resources from the number of originally allocated resources in the allocated record to update, and based on the updated allocated record, sending an allocation success notification to the job manager and clearing the allocated record; if the allocated resource number is less than the recovered resource number, clearing the allocated record, updating the recovered record based on the allocated resource number, namely subtracting the allocated resource number from the original recovered resource number in the recovered record to update, sending a recovery success notification to the operation manager based on the updated recovered record, and clearing the recovered record; .
If toasign is empty, the resource recovery record may be directly sent to the first control node, and after obtaining the recovered record, a recovery success notification is sent to the job manager based on the recovered record, and the recovered record is cleared.
In yet another possible implementation scenario:
the sending, by the first scheduling node, the resource reclamation record to the first control node may include:
judging whether a resource return record of the job to be processed aiming at the first machine node exists or not;
if the resource returning record is not sent to the first control node and the number of the resource returning parts in the resource returning record is larger than the number of the resource recycling parts in the resource recycling record, clearing the resource recycling record; if the resource returning record is not sent to the first control node and the number of the resource returning parts in the resource returning record is smaller than the number of the resource recycling parts in the resource recycling record, clearing the resource returning record;
and if not, sending the resource recovery record to the first control node.
That is, the first scheduling node may first update the RunningQueue and the tocommitterevoke, and prepare to send the resource recovery record in the tocommitterevoke to the first control node, if the tocommittermertke is found not to be empty and is not sent to the first scheduling node, for the first scheduling node, if the number of resource returns in the tocommittermerturn is greater than or equal to the number of resource returns in the tocommittermertke, the tocommittermertke may be cleared, the returned resource cancels the recovered resource, and only the resource return is performed, and the resource recovery record does not need to be sent to the first control node.
If the number of returned resources is less than the number of recovered resources, the recovered resources cancel the returned resources, and only the resource recovery record needs to be performed, so the tocommittert can be cleared, and the resource recovery record does not need to be sent to the first control node.
If the number of returned shares of the resource is equal to the number of recycled shares of the resource, ToCommitReturn or ToCommitRevoke may be cleared.
In yet another possible implementation scenario:
the first scheduling node is further configured to clear the resource recovery record if a resource return record of the job to be processed for the first machine node exists and the resource return record is sent to the first control node, and if the number of returned resources in the resource return record is greater than or equal to the number of recovered resources in the resource recovery record; and if the number of the returned resources in the resource returning record is smaller than the number of the returned resources in the resource recovery record, updating the resource recovery record based on the number of the returned resources, and sending the updated resource recovery record to a first control node.
That is, if the tocommittermerture is found not to be empty and has been sent to the first scheduling node, for the first scheduling node, if the number of returned resources in the tocommittermerture is greater than or equal to the number of recovered resources in the tocommittermecke, the tocommittermecke may be cleared without sending a resource recovery record to the first control node. And if the number of returned resources is less than the number of returned resources, updating the resource recovery record based on the number of returned resources, updating the number of returned resources subtracted from the number of returned resources in the resource recovery record, and sending the updated resource recovery record to the first control node.
In yet another possible implementation scenario:
the first scheduling node may send the resource return record to the first control node, including:
judging whether a resource recovery record of the job to be processed aiming at the first machine node exists or not;
if the resource recovery record is not sent to the first control node and the resource recovery number of copies in the resource recovery record is larger than the resource return number of copies in the resource return record, clearing the resource return record; if the resource returning record is not sent to the first control node and the number of the resource returning parts in the resource returning record is larger than the number of the resource recycling parts in the resource recycling record, clearing the resource recycling record and sending the resource returning record to the first control node;
and if not, sending the resource recovery record to the first control node.
In yet another possible implementation scenario:
the first scheduling node is further configured to clear the resource returning record if a resource returning record of the job to be processed for the first machine node exists and the resource returning record is sent to the first control node, and if the number of resource returning parts in the resource returning record is greater than or equal to the number of resource returning parts in the resource returning record; and if the resource recovery number of copies in the resource recovery record is smaller than the resource return number of copies in the resource return record, updating the resource return record based on the resource recovery number of copies, and sending the updated resource return record to a first control node.
As can be seen from the above description, in the job processing system according to the embodiment of the present application, network communications are performed between the request node and the scheduling node, between the scheduling node and the control node, and between the control node and the machine node, and multiple pieces of transmission messages may be continuously transmitted, and in order to improve network communication quality and further ensure job processing efficiency, the inventors have made a series of studies to provide that when a sending end and a receiving end are provided for each other between the request node and the scheduling node, between the scheduling node and the control node, and between the control node and the machine node, respectively, network communications may be performed according to a first predetermined communication manner as follows;
the first predetermined communication means includes:
the sending end caches a sending message to be sent to a first sending queue corresponding to the receiving end;
the sending end detects whether the first sending queue is in a first state; wherein the initial state of the first transmit queue is the first state;
if the first sending queue is in the first state, the sending end sends at least one sending message cached currently in the first sending queue to the receiving end, and switches the first sending queue to a second state; the receiving end processes the at least one sending message in sequence according to the caching sequence of the at least one sending message;
and the sending end receives a reply message of the receiving end aiming at the at least one sending message, clears the at least one sending message in the first sending queue and switches the first sending queue to the first state.
Optionally, if the sending end does not receive a reply message of the receiving end for the at least one message within a predetermined time after sending the at least one sending message currently cached in the first sending queue to the receiving end, the sending end may also force to send the at least one sending message currently cached in the first sending queue to the receiving end.
Optionally, in the first predetermined communication mode, the sending end may further set a message sequence number for each sent message buffered in the sending queue according to the buffering order; and the receiving end sequentially processes the at least one sending message according to the caching sequence indicated by the message sequence number of the at least one sending message.
Optionally, the sending end may set the message sequence numbers for the sending messages buffered in the sending queue according to a buffering order, and may set the message sequence numbers for the sending messages buffered in the first sending queue in sequence by using consecutive numbers starting from the number 1 according to the buffering order.
Optionally, the sending end may further set a first sending field for the first sending queue; wherein, the first sending field is used for storing the maximum message sequence number of the buffered message in the first sending queue;
the receiving, by the sending end, a reply message to the at least one message by the receiving end, clearing the at least one sent message in the sending queue, and switching the sending queue to the first state may include:
receiving at least one reply message and a second receiving field value sent by the receiving end; the second receiving field is used for storing the maximum message sequence number in the sending message sent by the sending end and processed by the receiving end;
if the second receiving field value is the same as the first sending field value, switching the first sending queue to the first state, and clearing the reply messages corresponding to the message sequence numbers which are smaller than or equal to the second receiving field value in the first sending queue.
Optionally, if the sender determines that the second receive field value is different from the first receive field value, the sender may clear the send messages corresponding to the message sequence numbers in the first send queue that are less than or equal to the second receive field value.
Optionally, the sending end may further set a first receiving field for the first sending queue; the first receiving field is used for storing the maximum message sequence number of the reply message sent by the receiving end and received by the sending end; the receiving end maintains a second sending queue for the sending end, and is used for caching a reply message to be sent to the sending end;
and when at least one sending message cached currently in the first sending queue is sent to the receiving end, the first receiving field value is sent to the receiving end, so that the receiving end can clear reply messages corresponding to message sequence numbers smaller than or equal to the first receiving field in the second sending queue.
Optionally, the sending end may maintain respective corresponding sending queues for different receiving ends respectively.
For example, in practical applications, one control node may maintain communication with thousands of machine nodes, and a scheduling node may also maintain communication with multiple control nodes, which is very heavy in communication pressure, and for example, the control nodes are independent from and do not interfere with each other when communicating with different machine nodes, so that, for the transmitting end, to further improve communication performance, corresponding transmit queues may be maintained for different receiving ends, respectively.
In addition, the sending end may also send messages of different message types to the same receiving end, for example, the scheduling node may send a resource allocation result, a resource recovery result, and a resource return result to the control node, which all belong to messages of different message types.
Therefore, the sending end can respectively maintain a plurality of first sending queues corresponding to different message types for the same receiving end;
the step of buffering, by the sending end, a sending message to be sent to a first sending queue corresponding to the receiving end may be:
determining the message type of a sending message to be sent;
determining a first sending queue corresponding to the message type from a plurality of sending queues maintained for the receiving end;
and caching the sending message to be sent to a first sending queue corresponding to the message type.
Optionally, the processing of at least one sending message sent by the sending end by the receiving end is sequentially performed according to the buffering sequence of the at least one sending message, so as to obtain one or more reply messages;
the receiving end may buffer the one or more reply messages to a second sending queue;
so that the receiving end can send at least one reply message currently buffered in the second sending queue to the sending end.
Optionally, if the receiving end detects whether the second sending queue is in the first state; when receiving a sending message of the sending end, switching the second sending queue to the first state;
and if the second sending queue is in the first state, sending at least one currently cached reply message in the second sending queue to a sending end, and switching the first sending queue to a second state.
Optionally, the receiving end may also set, in order from the number 1, message sequence numbers for the reply messages buffered in the second sending queue in sequence by using consecutive numbers according to the buffering order.
Optionally, the receiving end may further set a second sending field for the second sending queue; the second sending field is used for storing the maximum message sequence number of the reply message currently cached in the second sending queue;
the receiving, by the receiving end, the receiving of the at least one currently buffered transmission message in the first transmission queue sent by the sending end may include:
receiving at least one sending message and a first receiving field value which are cached currently in a first sending queue and sent by a sending end; the first receiving field value is used for storing a maximum message sequence number in a reply message sent by the receiving end and received by the sending end;
and clearing the messages corresponding to the message sequence numbers which are less than or equal to the first receiving field value in the second sending queue.
In addition, the sending end may send messages of different message types to the receiving end, for example, the scheduling node may send a resource allocation result, a resource recovery result, and a resource return result to the control node, which all belong to messages of different message types.
Therefore, in some embodiments, when the requesting node and the scheduling node, the scheduling node and the control node, and the control node and the machine node are a sending end and a receiving end, respectively, network communication may be performed according to a second predetermined communication method as follows:
a sending end determines a sending message to be sent;
the sending end searches a third sending queue in a first state from a plurality of sending queues; wherein the initial state of the third transmit queue is the first state;
the sending end buffers the sending message to the third sending queue;
the sending end sends at least one sending message cached currently in the third sending queue to a receiving end, and switches the third sending queue to a second state;
and the sending end receives a reply message of the receiving end aiming at the at least one sending message, clears the at least one sending message from the third sending queue, and switches the third sending queue to the first state.
Optionally, if the sending end does not receive the reply message of the receiving end within a predetermined time after sending the at least one sending message to the receiving end, the sending end forcibly sends the at least one sending message currently cached in the third sending queue to the receiving end.
Optionally, if the sending end determines that the plurality of sending queues are all in the second state, selecting any sending queue; buffering the sending message into any sending queue; sending at least one sending message cached currently in any sending queue to the receiving end; and receiving a reply message aiming at the at least one sending message, and switching any one sending queue to the first state.
Optionally, the sending end may set the message sequence numbers for the sending messages buffered in the target sending queue in sequence by using consecutive numbers starting from the number 1 according to the buffering sequence.
Optionally, the sending end may set a first sending field for the third sending queue; wherein, the first sending field is used for storing the maximum message sequence number of the buffered message in the third sending queue;
the sending end receiving a reply message to the at least one sending message from the receiving end, clearing the sending message from the third sending queue, and switching the third sending queue to the first state may include:
receiving at least one reply message and a second receiving field value sent by the receiving end; the second receiving field is used for storing the maximum message sequence number in the sending message sent by the sending end and processed by the receiving end;
if the second receiving field value is the same as the first sending field value, switching the third sending queue to the first state, and clearing the sending messages corresponding to the message sequence numbers which are smaller than or equal to the second receiving field value in the third sending queue.
Optionally, the sending end may further set a first receiving field for the third sending queue; the first receiving field is used for storing the maximum message sequence number of the reply message sent by the receiving end and received by the sending end; the receiving end maintains a fourth sending queue for the sending end, and is used for caching a reply message to be sent to the sending end;
and sending the currently cached at least one sending message in the third sending queue to the receiving end, and sending the first receiving field value to the receiving end, so that the receiving end can clear the reply messages corresponding to the message sequence numbers which are less than or equal to the first receiving field in the fourth sending queue.
Optionally, the sending end may further set an available resource list based on the queue identifiers of the multiple sending queues;
when the plurality of sending queues are in an initial state, sequentially storing queue identifications of the plurality of sending queues in the available resource list;
after the sending end buffers the sending message to the third sending queue and switches the third sending queue to the second state, the queue identifier of the third sending queue can be deleted from the available resource list; after switching the third transmit queue to the first state, the queue identifier of the third transmit queue may be further stored in the available resource list.
The step of searching, by the sending end, for the third sending queue in the first state from the multiple sending queues may be selecting any queue identifier from the available resource list; and determining the transmission queue represented by any queue identification as a third transmission queue.
Optionally, the receiving end sequentially processes the at least one sending message according to the buffering order of the at least one sending message to obtain one or more reply messages; the one or more reply messages may be buffered in a fourth send queue; so that at least one reply message currently buffered in the fourth sending queue can be sent to the sending end.
Corresponding to the resource control system shown in fig. 1, as shown in fig. 4, a resource control method is further provided for the embodiment of the present application, where the resource control method shown in fig. 4 is executed by a requesting node, and the method may include the following steps:
401: and receiving the to-be-processed job submitted by the user.
402: a first scheduling node is determined from a plurality of scheduling nodes.
Optionally, determining the first scheduling node from the plurality of scheduling nodes may include:
determining a first scheduling node with load pressure meeting scheduling conditions according to the load conditions of different scheduling nodes; and requesting the first scheduling node to perform scheduling processing on the job to be processed.
403: sending a scheduling processing request to the first scheduling node so that the first scheduling node creates a job manager for the job to be processed, determines that a first machine node with available resources exists, allocates machine resources in the first machine node for the job to be processed, sends a resource allocation result of the first machine node to a corresponding first control node, receives an allocation success message fed back by the first control node, and sends an allocation success notification to the job manager so that the job manager creates a job node in the first machine node; and the first control node is used for judging whether the first machine node allows resource allocation, and if so, feeding back an allocation success message to the first scheduling node.
Corresponding to the resource control system shown in fig. 1, as shown in fig. 5, a resource control method is further provided for the embodiment of the present application, where the resource control method shown in fig. 5 is executed by a scheduling node, and the method may include the following steps:
501: and receiving a scheduling processing request sent by the request node.
The scheduling processing request is sent after the request node receives a job to be processed submitted by a user and determines a first scheduling node.
502: and creating a job manager for the job to be processed.
503: determining that there is a first machine node of available resources, and allocating machine resources in the first machine node for the job to be processed.
504: and sending the resource allocation result of the first machine node to the corresponding first control node.
505: and receiving an allocation success message fed back by the first control node, and sending an allocation success notification to the job manager so that the job manager can create a job node in the first machine node.
The first control node is configured to determine whether the first machine node allows resource allocation, and if so, feed back an allocation success message.
In some embodiments, allocating machine resources in the first machine node for the pending job comprises: determining the number of resource allocation copies in the first machine node; and generating a resource allocation result of the first machine node based on the resource allocation number.
In certain embodiments, the method may further comprise:
generating a resource recovery result according to the resource recovery number of the to-be-processed operation in the first machine node; sending the resource recovery result to the first control node, so that the first control node requests the first machine node to release corresponding resources according to the resource recovery result, and feeds back a recovery success message to the first scheduling node;
and receiving a recovery success message fed back by the first control node, and sending a recovery success notification to the job manager, so that the job manager finishes recovering the job task executed by the job node running in the resource.
In certain embodiments, the method may further comprise:
the resource returning number in the resource returning request is used for generating a resource returning result; and sending the resource returning result to the first control node, so that the first control node requests the first machine node to release corresponding resources according to the resource returning result of the first machine node, and feeds back a returning success message to the first scheduling node.
In some embodiments, sending the resource allocation result of the first machine node to the first control node comprises:
updating a resource allocation record corresponding to the job to be processed in the first machine node based on the resource allocation result of the first machine node;
sending the resource allocation record to the first control node, so that the first control node can judge whether resource allocation is allowed or not according to the resource allocation record; if the resource allocation is allowed, feeding back an allocation success message to the first scheduling node;
the receiving an allocation success message of the first control node and sending an allocation success notification to the job manager includes:
receiving an allocation success message of the first control node;
updating the allocated record of the job to be processed aiming at the first machine node based on the successfully allocated resource number, and clearing the successfully allocated resource number in the resource allocation record;
based on the allocated record, sending an allocation success notification to the job manager, and clearing the allocated record.
In some embodiments, sending the resource reclamation result for the first machine node to the first control node comprises:
updating a resource recovery record corresponding to the job to be processed in a first machine node based on a resource recovery result of the first machine node; sending the resource recovery record to the first control node, so that the first control node requests the first machine node to release corresponding resources according to the resource recovery record, and feeds back a recovery success message to the first scheduling node;
the receiving the recovery success message and sending a recovery success notification to the job manager includes:
receiving a recovery success message of the first control node, updating a corresponding recovered record of the job to be processed in the first machine node based on the successfully recovered resource number, and clearing the successfully recovered resource number from the resource recovery record;
based on the reclaimed record, sending a reclamation success notification to the job manager, and clearing the reclaimed record.
In some embodiments, sending the resource allocation record to the first control node comprises:
judging whether a resource recovery record of the job to be processed aiming at the first machine node exists or not;
if the resource recovery record is not sent to the first scheduling node and the number of resource recovery parts in the resource recovery record is larger than the number of resource allocation parts in the resource allocation record, clearing the resource allocation record and updating the resource recovery record based on the number of resource allocation parts; if the resource recovery record is not sent to the first scheduling node and the number of resource recovery parts in the resource recovery record is smaller than the number of resource allocation parts in the resource allocation record, clearing the resource recovery record, updating the resource allocation record based on the number of resource recovery parts, and sending the updated resource allocation record to the first control node; if the resource recovery record is not sent to the first scheduling node and the number of resource recovery parts in the resource recovery record is equal to the number of resource allocation parts in the resource allocation record, clearing the resource allocation record and the resource recovery record;
and if not, sending the resource allocation record to the first control node.
In certain embodiments, the method may further comprise:
and if the resource recovery record exists and the resource recovery record is sent to the first control node, sending the resource allocation record to the first control node.
In some embodiments, sending an allocation success notification to the job manager based on the allocated record, and clearing the allocated record comprises:
judging whether a recovered record of the to-be-processed operation for the first machine node exists or not;
if so, if the number of allocated resources in the allocated record is equal to the number of recycled resources in the recycled record, clearing the allocated record and the recycled record; if the number of allocated resources in the allocated record is greater than the number of recycled resources in the recycled record, clearing the recycled record, updating the allocated record based on the number of recycled resources, sending an allocation success notification to the job manager based on the updated allocated record, and clearing the allocated record; if the allocated resource number in the allocated record is smaller than the recycled resource number in the recycled record, clearing the allocated record, and updating the recycled record based on the allocated resource number;
if not, based on the allocated record, sending an allocation success notice to the job manager, and clearing the allocated record.
In some embodiments, sending the resource reclamation record to the first control node comprises:
judging whether a resource allocation record of the job to be processed for the first machine node exists or not;
if the resource allocation record is not sent to the first control node and the number of the resource allocation parts in the resource allocation record is larger than the number of the resource recovery parts in the resource recovery record, clearing the resource recovery record and updating the resource allocation record based on the number of the resource recovery parts; if the resource allocation record is not sent to the first control node and the number of resource allocation copies in the resource allocation record is smaller than the number of resource recovery copies in the resource recovery record, clearing the resource allocation record, updating the resource recovery record based on the number of resource recovery copies, and sending the updated resource recovery record to the first control node; if the resource allocation record is not sent to the first control node and the number of resource allocation copies in the resource allocation record is equal to the number of resource recovery copies in the resource recovery record, clearing the resource allocation record and the resource recovery record;
and if not, sending the resource recovery record to the first control node.
In certain embodiments, the method may further comprise:
and if the resource allocation record corresponding to the job to be processed in the first machine node exists and the resource allocation record is sent to the first control node, sending the resource recovery record to the first control node.
In some embodiments, sending a reclamation success notification to the job manager based on the reclaimed record, and clearing the reclaimed record comprises:
judging whether an allocated record of the job to be processed aiming at the first machine node exists;
if so, if the number of the recycled resource copies in the recycled record is larger than the number of the allocated resource copies in the allocated record, clearing the allocated record and updating the recycled record based on the allocated resource copies, and sending a recycling success notification to the job manager based on the recycled record after updating and clearing the recycled record; if the number of the recycled resource copies in the recycled record is smaller than the number of the allocated resource copies in the allocated record, clearing the recycled record and updating the allocated record based on the number of the recycled resource copies; if the number of the recycled resource copies in the recycled record is equal to the number of the allocated resource copies in the allocated record, clearing the recycled record and the allocated record;
if not, based on the recovered record, a recovery success notice is sent to the operation manager, and the recovered record is cleared.
In some embodiments, sending the resource reclamation record to the first control node comprises:
judging whether a resource return record of the job to be processed aiming at the first machine node exists or not;
if the resource returning record is not sent to the first control node and the number of the resource returning parts in the resource returning record is larger than the number of the resource recycling parts in the resource recycling record, clearing the resource recycling record; if the resource returning record is not sent to the first control node and the number of the resource returning parts in the resource returning record is smaller than the number of the resource recycling parts in the resource recycling record, clearing the resource returning record and sending the resource recycling record to the first control node;
and if not, sending the resource recovery record to the first control node.
In certain embodiments, the method may further comprise:
if the resource returning record of the job to be processed aiming at the first machine node exists and the resource returning record is sent to the first control node, if the resource returning number in the resource returning record is larger than or equal to the resource recycling number in the resource recycling record, clearing the resource recycling record;
and if the number of the returned resources in the resource returning record is smaller than the number of the returned resources in the resource recovery record, updating the resource recovery record based on the number of the returned resources, and sending the updated resource recovery record to a first control node.
In some embodiments, sending the resource return record to the first control node comprises:
judging whether a resource recovery record of the job to be processed aiming at the first machine node exists or not;
if the resource recovery record is not sent to the first control node and the resource recovery number of copies in the resource recovery record is larger than the resource return number of copies in the resource return record, clearing the resource return record; if the resource returning record is not sent to the first control node and the number of the resource returning parts in the resource returning record is larger than the number of the resource recycling parts in the resource recycling record, clearing the resource recycling record and sending the resource returning record to the first control node;
and if not, sending the resource recovery record to the first control node.
In certain embodiments, the method may further comprise:
if the resource recovery record of the job to be processed aiming at the first machine node exists and the resource recovery record is sent to the first control node, if the resource recovery number of parts in the resource recovery record is larger than or equal to the resource return number of parts in the resource return record, clearing the resource return record;
and if the resource recovery number of copies in the resource recovery record is smaller than the resource return number of copies in the resource return record, updating the resource return record based on the resource recovery number of copies, and sending the updated resource return record to a first control node.
Corresponding to the resource control system shown in fig. 1, as shown in fig. 6, a resource control method is further provided for the embodiment of the present application, where the resource control method shown in fig. 6 is executed by a control node, and the method may include the following steps:
601: and receiving a resource allocation result of the first machine node sent by the first scheduling node.
The resource allocation result is a first machine node which determines that available resources exist for the first scheduling node, and allocates machine resources in the first machine node for the job to be processed to obtain; and the job to be processed is a request node for receiving and requesting the first scheduling node to perform scheduling processing.
602: judging whether the first machine node allows resource allocation; if so, go to step 603, otherwise, end the process.
603: and feeding back an allocation success message to the first scheduling node so that the first scheduling node sends an allocation success notification to the job manager.
Fig. 4 to fig. 6 respectively describe the resource control method provided in the embodiment of the present application from the perspective of different execution subjects, such as a request node, a scheduling node, and a control node, and operations specifically executed by the request node, the scheduling node, and the control node have been described in detail in the above system embodiment, and the implementation principle and technical effect thereof may be referred to above, and will not be described in detail here.
Corresponding to the resource control method shown in fig. 4, an embodiment of the present application further provides a resource control apparatus, and as shown in fig. 7, the apparatus may include:
a job receiving module 701, configured to receive a to-be-processed job submitted by a user;
a node determining module 702, configured to determine a first scheduling node from a plurality of scheduling nodes;
a request processing module 703, configured to send a scheduling processing request to the first scheduling node, so that the first scheduling node creates a job manager for the job to be processed, determines that there is a first machine node with available resources, allocates machine resources in the first machine node for the job to be processed, sends a resource allocation result of the first machine node to a corresponding first control node, receives an allocation success message fed back by the first control node, and sends an allocation success notification to the job manager, so that the job manager creates a job node in the first machine node; and the first control node is used for judging whether the first machine node allows resource allocation, and if so, feeding back an allocation success message to the first scheduling node.
The resource control apparatus shown in fig. 7 may execute the resource control method shown in the embodiment shown in fig. 6, and the implementation principle and the technical effect are not repeated.
In one possible design, the resource control apparatus of the embodiment shown in fig. 7 may be implemented as a computing device, which may include a storage component 801 and a processing component 802 as shown in fig. 8;
the storage component 801 stores one or more computer instructions for execution invoked by the processing component 802.
The processing component 802 is configured to:
receiving a job to be processed submitted by a user;
determining a first scheduling node from a plurality of scheduling nodes;
sending a scheduling processing request to the first scheduling node so that the first scheduling node creates a job manager for the job to be processed, determines that a first machine node with available resources exists, allocates machine resources in the first machine node for the job to be processed, sends a resource allocation result of the first machine node to a corresponding first control node, receives an allocation success message fed back by the first control node, and sends an allocation success notification to the job manager so that the job manager creates a job node in the first machine node; and the first control node is used for judging whether the first machine node allows resource allocation, and if so, feeding back an allocation success message to the first scheduling node.
The processing component 802 may include one or more processors executing computer instructions to perform all or some of the steps of the methods described above. Of course, the processing elements may also be implemented as one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components configured to perform the above-described methods.
The storage component 801 is configured to store various types of data to support operations in the computing device. The memory components may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Of course, a computing device may also necessarily include other components, such as input/output interfaces, communication components, and so forth.
The input/output interface provides an interface between the processing components and peripheral interface modules, which may be output devices, input devices, etc.
The communication component is configured to facilitate wired or wireless communication between the computing device and other devices, and the like.
The computing device may be a physical device or an elastic computing host provided by a cloud computing platform, and the computing device may be a cloud server, and the processing component, the storage component, and the like may be a basic server resource rented or purchased from the cloud computing platform.
An embodiment of the present application further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a computer, the resource control method according to the embodiment shown in fig. 3 may be implemented.
Corresponding to the resource control method shown in fig. 5, an embodiment of the present application further provides a resource control apparatus, and as shown in fig. 9, the apparatus may include:
a request receiving module 901, configured to receive a scheduling processing request sent by the requesting node; the scheduling processing request is sent after the request node receives a job to be processed submitted by a user and determines a first scheduling node;
a creating module 902, configured to create a job manager for the job to be processed;
a resource allocation module 903, configured to determine that there is a first machine node of available resources, and allocate machine resources in the first machine node for the job to be processed;
a result sending module 904, configured to send a resource allocation result of the first machine node to a corresponding first control node;
a notification module 905, configured to receive an allocation success message fed back by the first control node, and send an allocation success notification to the job manager, so that the job manager creates a job node in the first machine node; the first control node is configured to determine whether the first machine node allows resource allocation, and if so, feed back an allocation success message.
The resource control apparatus shown in fig. 9 may execute the resource control method shown in the embodiment shown in fig. 5, and the implementation principle and the technical effect are not repeated.
In one possible design, the resource control apparatus of the embodiment shown in fig. 9 may be implemented as a computing device, which may include a storage component 1001 and a processing component 1002 as shown in fig. 10;
the storage component 1001 stores one or more computer instructions for the processing component 1002 to invoke for execution.
The processing component 1002 is configured to:
receiving a scheduling processing request sent by the request node; the scheduling processing request is sent after the request node receives a job to be processed submitted by a user and determines a first scheduling node;
creating a job manager for the job to be processed;
determining a first machine node with available resources, and allocating machine resources in the first machine node for the job to be processed;
sending the resource allocation result of the first machine node to a corresponding first control node;
receiving an allocation success message fed back by the first control node, and sending an allocation success notification to the job manager so that the job manager can create a job node in the first machine node; the first control node is configured to determine whether the first machine node allows resource allocation, and if so, feed back an allocation success message.
Among other things, the processing component 1002 may include one or more processors to execute computer instructions to perform all or some of the steps of the methods described above. Of course, the processing elements may also be implemented as one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components configured to perform the above-described methods.
The storage component 1001 is configured to store various types of data to support operations in a computing device. The memory components may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Of course, a computing device may also necessarily include other components, such as input/output interfaces, communication components, and so forth.
The input/output interface provides an interface between the processing components and peripheral interface modules, which may be output devices, input devices, etc.
The communication component is configured to facilitate wired or wireless communication between the computing device and other devices, and the like.
The computing device may be a physical device or an elastic computing host provided by a cloud computing platform, and the computing device may be a cloud server, and the processing component, the storage component, and the like may be a basic server resource rented or purchased from the cloud computing platform.
An embodiment of the present application further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a computer, the resource control method in the embodiment shown in fig. 5 may be implemented.
Corresponding to the resource control method shown in fig. 6, an embodiment of the present application further provides a resource control apparatus, as shown in fig. 11, the apparatus may include:
a result receiving module 1101, configured to receive a resource allocation result of the first machine node sent by the first scheduling node; the resource allocation result is a first machine node which determines that available resources exist for the first scheduling node, and allocates machine resources in the first machine node for the job to be processed to obtain; the job to be processed is a request node for receiving and requesting the first scheduling node to perform scheduling processing;
a resource determining module 1102, configured to determine whether the first machine node allows resource allocation;
a result feedback module 1103, configured to feed back, if resource allocation is allowed, an allocation success message to the first scheduling node, so that the first scheduling node sends an allocation success notification to the job manager.
The resource control apparatus shown in fig. 11 may execute the resource control method shown in the embodiment shown in fig. 5, and the implementation principle and the technical effect are not described again.
In one possible design, the resource control apparatus of the embodiment shown in fig. 11 may be implemented as a computing device, which may include a storage component 1201 and a processing component 1202 as shown in fig. 12;
the storage component 1201 stores one or more computer instructions for the processing component 1202 to invoke for execution.
The processing component 1202 is configured to:
receiving a resource allocation result of the first machine node sent by the first scheduling node; the resource allocation result is a first machine node which determines that available resources exist for the first scheduling node, and allocates machine resources in the first machine node for the job to be processed to obtain; the job to be processed is a request node for receiving and requesting the first scheduling node to perform scheduling processing;
judging whether the first machine node allows resource allocation;
and if the resource allocation is allowed, feeding back an allocation success message to the first scheduling node so that the first scheduling node sends an allocation success notification to the job manager.
The processing component 1202 may include one or more processors executing computer instructions to perform all or part of the steps of the methods described above. Of course, the processing elements may also be implemented as one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components configured to perform the above-described methods.
The storage component 1201 is configured to store various types of data to support operations in a computing device. The memory components may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Of course, a computing device may also necessarily include other components, such as input/output interfaces, communication components, and so forth.
The input/output interface provides an interface between the processing components and peripheral interface modules, which may be output devices, input devices, etc.
The communication component is configured to facilitate wired or wireless communication between the computing device and other devices, and the like.
The computing device may be a physical device or an elastic computing host provided by a cloud computing platform, and the computing device may be a cloud server, and the processing component, the storage component, and the like may be a basic server resource rented or purchased from the cloud computing platform.
An embodiment of the present application further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a computer, the resource control method in the embodiment shown in fig. 5 may be implemented.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.