CN107528914B

CN107528914B - Resource requisition scheduling method for data fragmentation

Info

Publication number: CN107528914B
Application number: CN201710817536.6A
Authority: CN
Inventors: 罗光春; 殷光强; 田玲; 陈爱国
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2017-09-12
Filing date: 2017-09-12
Publication date: 2020-05-12
Anticipated expiration: 2037-09-12
Also published as: CN107528914A

Abstract

The invention relates to a resource requisition scheduling method for data fragmentation, including: A. determining the maximum possible number of data fragments that can be divided into data; B. generating a pre-scheduling resource set; C. determining that each data fragment is allocated to a pre-scheduling resource node D. Calculate the final completion time of the task; E. Obtain the completion time and the proportional coefficient of the data fragment; F. Obtain the proportional coefficient with the largest value to obtain the actual fragment size of the data fragment; G. According to the remaining data volume Or the remaining number of slicing pieces to iterate and complete the data slicing; H. According to the pre-slicing coefficient and the actual and slicing granularity, the final data slice size and the corresponding resource scheduling strategy are obtained. The present invention can make full use of the performance of available resource nodes in resource requisition scheduling, adopt a reasonable data fragmentation strategy, and requisition reasonable resource nodes for each data fragment for data processing, so that the overall task completion time is shortest, and the task is effectively improved. execution efficiency.

Description

Resource requisition scheduling method for data fragmentation

Technical Field

The invention relates to a computer data processing method, in particular to a resource utilization scheduling method for data fragmentation.

Background

Due to diversification and resource sharing of the cloud computing system, the user group is huge, and the number of tasks and data amount to be processed at the same time in the cloud environment are also huge, so that the resources and the tasks in the cloud environment are reasonably and efficiently scheduled, the service quality of the users is ensured, the task execution efficiency of the users and the resource use efficiency are improved, and the cloud computing system is a key point and a difficult point in cloud computing technology research. Physical resource nodes in the cloud computing system are heterogeneous, the node state can change dynamically along with the calling condition of the resources, the resources in the system are continuously called and released, new resources can be added into the current cloud system along with the change of the system or the requirement of tasks, and in addition, task scheduling under different application scenes has different characteristics, so that the complexity of a scheduling algorithm in a cloud environment is extremely high. Therefore, aiming at a specific application scene, an effective task scheduling strategy is formulated according to the characteristics of the corresponding scene, and reasonable mapping is established between the task nodes and the resource nodes, so that the execution time of the task can be shortened, and the execution efficiency of the task can be improved.

In recent years, with the deep development of cloud computing technology and the popularization of application of business service models, a lot of researchers and scholars have made a lot of research on task scheduling algorithms in a cloud environment. After sorting and induction, the problems of task scheduling research in the cloud environment mainly include: the method comprises the following steps of paying attention to the execution efficiency problem of a task scheduling algorithm, paying attention to the service quality control problem of task scheduling in a cloud computing environment and paying attention to the economic benefit problem of a cloud service provider. These task scheduling algorithms are summarized mainly in the following categories: (1) heuristic task scheduling algorithm: such heuristic task scheduling algorithms mainly include a particle swarm algorithm, an ant colony algorithm, a gravity search algorithm, a genetic algorithm, and the like. Such scheduling algorithms can map tasks to appropriate resources according to changes of resource states during scheduling to complete resource allocation and task execution. On the basis of the algorithm idea, many researches and improvements are made to improve the performance of the algorithm and the applicability under corresponding scenes. (2) Scheduling algorithm based on QoS target constraint condition: the algorithm meets the service quality of user task scheduling as much as possible, and considers the deadline, the scheduling budget, the reliability, the safety and the like during task scheduling as the constraint conditions of scheduling so as to meet the preference requirement of a user when selecting resources. (3) The scheduling algorithm based on the income of the cloud service provider comprises the following steps: the algorithm considers the scheduling problem from the perspective of a cloud service provider, and utilizes resources to the maximum extent or reduces resource consumption on the premise of ensuring the completion of task scheduling, thereby improving the benefit of the provider.

When the task is scheduled, in the process of mapping the task nodes to the resource nodes, the scheduling effect is closely related to the calculation cost and the communication cost of scheduling. For cloud users, it is desirable to obtain a response in a short time after a task is submitted to a cloud platform, and cloud providers need to provide better quality of service using limited resources, so from different perspectives of users and providers, the response time of the task is an important index for task scheduling. In data processing services, in order to shorten the processing time of a task and fully utilize idle resources in a system, a common method is to fragment data related to the task and distribute the data to a plurality of idle resource nodes for execution. However, in the process of data fragmentation, there is no good scheduling strategy to solve the problem of how to fragment, i.e. how many fragments are fragmented, how much data is per fragment, and which resource node is matched with the fragmented data. The conventional data fragmentation method is to perform average fragmentation on data to be processed, and distribute data associated with a task to each resource node for execution after performing average fragmentation according to the number of available resources.

Disclosure of Invention

The invention provides a resource expropriation scheduling method for data fragmentation, which can reasonably fragment big data and perform resource expropriation and can fully utilize idle available resources to perform data processing.

The resource requisition scheduling method for data fragmentation of the invention comprises the following steps:

A. determining task T_iMaximum possible data fragmentation DN that can be split_i：

Wherein i is the number of tasks, D_iFor task T_iAmount of data to be processed, MS_iFor task T_iA minimum resolvable granularity of the data of (a);

B. the method comprises the steps of screening the available resources, selecting m resources from n resource nodes and adding the m resources into a pre-scheduling set, wherein m is DN_i；

C. According to a pre-slicing proportionality coefficient lambda_j' determining a scheduling time FFT for each data slice to be allocated to each prescheduled resource node in the prescheduled set_ij(ii) a The pre-slicing proportionality coefficient is the proportionality coefficient which is averagely distributed or preset for data slicing under the condition of not knowing the computing power and the network transmission capability initially, and then the proportionality coefficient is distributed to each pre-scheduling resource for carrying out the FFT of the following scheduling time_ijAnd (4) calculating;

D. according to task T_iThe maximum value of the time of completing the task T of each data fragment on each resource node is calculated_iIs the final completion time TFT_i；

E. By task T_iIs the final completion time TFT_iCalculating to obtain a task T_iCompletion time FF of_iAnd task T_iData slice scaling factor lambda of_j；

F. Secondary slicing: obtaining a data slicing proportionality coefficient lambda_jThe proportionality coefficient λ with the largest median value_kTo obtain task T_iData slicing S of_ikActual slice size D_ik ^*；

G. Residual data amount is D_i ^*＝D_i ^*-D_ik ^*The remaining number of the divisible pieces is

m is the original number of fragments, and D is calculated_i ^*0 or m^*If it is 0, completing data slicing and setting the size of other pre-slices to 0, otherwise, setting the maximum scale factor lambda as_kSetting 0, returning to the step F and repeatedly executing;

H. obtaining task T after finishing data fragmentation_iActual fragmentation strategy S_i*＝{S_i1*,S_i2*,…,S_ij*,…,S_im} wherein the data is sliced S_ij ^*Has a size of D_ij ^*Each D is obtained by calculation_ij ^*A value of (D) if_ij ^*Not equal to 0, the corresponding size is D_ij ^*Data slicing S of_ij ^*To resource nodes R_jPerforming data processing on the data if D_ij ^*When equal to 0, the data is sliced S_ij ^*Is empty, no resources are allocated, where j is 1 … m.

According to the invention, through secondary fragmentation, the performance of available resource nodes can be fully utilized in resource expropriation scheduling, and a reasonable data fragmentation strategy is adopted to expropriate reasonable resource nodes for each data fragmentation to perform data processing, so that the overall completion time of a task is shortest, and the execution efficiency of the task is effectively improved.

Specifically, in step C, the FFT of the scheduling time of each prescheduled resource node is calculated_ijComprises the following steps:

wherein D_iFor task T_iAmount of data to be processed, resource node R_jHas a computing power of R_j(C)，λ_j' slicing S for data_ijPre-slicing scaling factor, task scheduler and resource node R_jCommunication delay between c_jNetwork performance of R_j(N)。

Specifically, the calculation task T described in step D_iIs the final completion time TFT_iComprises the following steps:

where m is the number of resources that can be invoked, D_iFor task T_iAmount of data to be processed, resource node R_jHas a computing power of R_j(C)，λ_j' slicing S for data_ijPre-slicing scaling factor, task scheduler and resource node R_jCommunication delay between c_jNetwork performance of R_j(N)。

In particular, the stepsIn step E by task T_iIs the final completion time TFT_iCalculating to obtain a task T_iCompletion time FF of_iComprises the following steps:

where m is the number of resources that can be invoked, D_iFor task T_iAmount of data to be processed, resource node R_jHas a computing power of R_j(C) Network performance of R_j(N), task scheduler and resource node R_jCommunication delay between c_jData slicing S_ijCoefficient of proportionality λ_jComprises the following steps:

wherein c is_kFor task scheduler and resource node R_kThe time delay of the communication between the two terminals,

R_k(C) is a resource node R_kCalculated performance of R_k(N) is a resource node R_kNetwork transmission performance.

Specifically, the task T described in step F_iData slicing S of_ikActual slice size D_ik ^*Comprises the following steps:

wherein m is the number of resources that can be invoked.

The resource expropriation scheduling method of the data fragments can fully utilize the performance of available resource nodes in the resource expropriation scheduling, adopts a reasonable data fragment strategy, and expropriates reasonable resource nodes for each data fragment to perform data processing, so that the overall completion time of a task is shortest, and the execution efficiency of the task is effectively improved.

The present invention will be described in further detail with reference to the following examples. This should not be understood as limiting the scope of the above-described subject matter of the present invention to the following examples. Various substitutions and alterations according to the general knowledge and conventional practice in the art are intended to be included within the scope of the present invention without departing from the technical spirit of the present invention as described above.

Drawings

Fig. 1 is a logic flow diagram of a resource requisition scheduling method for data fragmentation according to the present invention.

FIG. 2 is a model diagram of data processing task scheduling in an embodiment.

FIG. 3 is a graph comparing response times of the method of the embodiment with those of the prior art at different resource node scales.

FIG. 4 is a graph comparing response times of tasks processing different amounts of data by the method of the embodiment and the prior art.

Detailed Description

As shown in fig. 1, the resource utilization scheduling method for data fragmentation of the present invention includes:

Wherein i is the number of tasks, D_iFor task T_iAmount of data to be processed, MS_iFor task T_iThe minimum resolvable granularity of the data.

B. The method comprises the steps of screening the available resources, selecting m resources from n resource nodes and adding the m resources into a pre-scheduling set, wherein m is DN_i。

In a cloud computing environment, after a user submits a task to a cloud platform, a task scheduler in the cloud platform selects appropriate resources for a task queue submitted by the user to execute the task according to a formulated task scheduling strategy. When the cloud environment processes a task with a large data volume, resource utilization is needed, idle dispersed resources in the cloud environment are fully utilized, data related to a data processing type task are divided into a plurality of data fragments, and then the divided data fragments are dispatched to each resource node for processing by using a dispatching strategy, so that the overall execution efficiency of the task is improved. The data slicing task scheduling model is shown in fig. 2.

C. At task T_iData slicing S of_ijFrom task scheduler to resource node R_jIn the transmission process of (2), the transmission duration of the data fragment is obtained by adding the communication delay and the data fragment transmission delay, and the communication delay is set as c_jAnd according to resource node R_jHas a network performance of R_j(N), available data slices S_ijTransmission delay FTT_ijComprises the following steps:

λ_j' is a pre-slicing scaling factor, when data is sliced S_ijAt a resource node R_jWhen scheduling is carried out, according to the resource node R_jComputing power R of_j(C) Available at resource node R_jUpper processing data slicing S_ijTime of FDPT_ijComprises the following steps:

data slicing S_ijAt a resource node R_jThe time for completing the data scheduling is composed of the transmission time of the data slice and the processing time of the data slice, so that the data slice S is calculated_ijTo resource nodes R_jScheduling time FFT of up-time data processing_ij：

D. The completion time of the task depends on all the data slices in the corresponding task sectionPoint-on-Point processing completion time, task T_iThe final completion time of is task T_iIs done for the maximum value of time on each resource node, so task T_iFinal completion time of

E. And performing strategy optimization and solution. When the data processing type tasks are executed in the cloud environment, the scheduling aims to finish the processing of the task associated data submitted by the user in the shortest time, and the task T is used for_iIs the final completion time TFT_iCalculating to obtain a task T_iCompletion time FF of_iAnd task T_iFractional scaling factor lambda of_jWherein task T_iCompletion time of

Resource node R_jHas a computing power of R_j(C) Network performance of R_j(N), task scheduler and resource node R_jCommunication delay between c_j。

In the solving process of the optimization strategy, the optimization target of the solving problem is firstly converted into a KKT condition (Karush-Kuhn-Tucker, optimization condition), and the solving method of the KKT condition can simplify the solving of the objective function. Specifically, an optimized objective function forms a Lagrangian expression under an inequality constraint condition, the constructed Lagrangian expression is converted into a KKT condition, and the KKT condition is solved to obtain the task T_iData slicing S of_ijData slice scaling factor of

F. Secondary slicing: finding data slices S according to step E_ijData slice scaling factor lambda of_jThen, due to the inseparability of the minimum granularity of the data, the size of D cannot be obtained in practical segmentation_i*λ_jThe data fragmentation needs to consider the data shareable granularity, and secondary fragmentation is carried out to obtain a task T_iThe actual fragmentation scheduling scheme. The method specifically comprises the following steps:

in the actual slicing process, the current slicing residual data volume is set as D_i ^*The current number of divisible data pieces is m^*Then according to task T_iAssociated data size and sliceable granularity, with D before the actual slicing_i ^*＝D_i，m^*M. Setting a new scheduling policy to S_i*＝{S_i1*,S_i2*,…,S_imAnd the fragment obtained by the calculation is S_ij ^*(j＝1,2,…,m)，S_ij ^*Corresponding to a slice size of D_ij ^*Proportional coefficient lambda_jAnd (j ═ 1,2, … m) to perform secondary data fragmentation to obtain a new scheduling policy.

Obtaining a scaling factor lambda of a data slice_jThe proportionality coefficient λ with the largest median value_kTo obtain task T_iData slicing S of_ikActual slice size of

m is the original number of fragments, and D is calculated_i ^*0 or m^*If it is 0, completing data slicing and setting the size of other pre-slices to 0, otherwise, setting the maximum scale factor lambda as_kAnd setting 0, returning to the step F and repeatedly executing.

H. Obtaining task T after finishing data fragmentation_iNew actual fragmentation strategy S_i*＝{S_i1*,S_i2*,…,S_ij*,…,S_im} wherein the data is sliced S_ij ^*Has a size of D_ij ^*Each D is obtained by calculation_ij ^*A value of (D) if_ij ^*Not equal to 0, the corresponding size is D_ij ^*Data slicing S of_ij ^*To resource nodes R_jPerforming data processing on the data if D_ij ^*When equal to 0, the data is sliced S_ij ^*Is empty, no resources are allocated, where j is 1 … m.

The method of the present embodiment (abbreviated as SDSA) and the average sharded task scheduling method (ESDSA) in the prior art are compared by computer simulation, with task response times under different resource node sizes and different task data volumes as performance indicators. The task execution time satisfaction rate expresses the satisfaction degree of the task scheduling algorithm on the user task execution time requirement, and embodies the guarantee degree of the user requirement and the task processing efficiency of the cloud environment.

Fig. 3 shows the corresponding time comparison results of the two methods under different resource node scales, and fig. 4 shows the corresponding time comparison results of the two methods for tasks with different data volumes. According to the comparison result, the method of the invention can better meet the requirement of the user on the task execution time compared with the existing method, and meanwhile, the distribution efficiency and the execution efficiency of the tasks are obviously improved.

Claims

1. The resource requisition scheduling method for data fragmentation is characterized by comprising the following steps:

C. According to a pre-slicing proportionality coefficient lambda_j' determining a scheduling time FFT for each data slice to be allocated to each prescheduled resource node in the prescheduled set_ij；

F. Secondary slicing: obtaining a data slicing proportionality coefficient lambda_jThe proportionality coefficient λ with the largest median value_kTo obtain task T_iData slicing S of_ikActual slice size D_ik ^*：

Wherein m is the number of the resources which can be called;

H. obtaining task T after finishing data fragmentation_iActual fragmentation strategy S_i ^*＝{S_i1 ^*,S_i2 ^*,…,S_ij ^*,…,S_im ^*Where the data is sliced S_ij ^*Has a size of D_ij ^*Each D is obtained by calculation_ij ^*A value of (D) if_ij ^*Not equal to 0, the corresponding size is D_ij ^*Data slicing S of_ij ^*To resource nodes R_jPerforming data processing on the data if D_ij ^*When equal to 0, the data is sliced S_ij ^*Is empty, no resources are allocated, where j is 1 … m.

2. The method of claim 1, wherein the resource demand scheduling method comprises: calculating the scheduling time FFT on each prescheduled resource node in the step C_ijComprises the following steps:

3. The method of claim 1, wherein the resource demand scheduling method comprises: in step D, a calculation task T_iIs the final completion time TFT_iComprises the following steps:

4. The method of claim 1, wherein the resource demand scheduling method comprises: pass task T in step E_iIs the final completion time TFT_iCalculating to obtain a task T_iCompletion time FF of_iComprises the following steps: