Background
Due to diversification and resource sharing of the cloud computing system, the user group is huge, and the number of tasks and data amount to be processed at the same time in the cloud environment are also huge, so that the resources and the tasks in the cloud environment are reasonably and efficiently scheduled, the service quality of the users is ensured, the task execution efficiency of the users and the resource use efficiency are improved, and the cloud computing system is a key point and a difficult point in cloud computing technology research. Physical resource nodes in the cloud computing system are heterogeneous, the node state can change dynamically along with the calling condition of the resources, the resources in the system are continuously called and released, new resources can be added into the current cloud system along with the change of the system or the requirement of tasks, and in addition, task scheduling under different application scenes has different characteristics, so that the complexity of a scheduling algorithm in a cloud environment is extremely high. Therefore, aiming at a specific application scene, an effective task scheduling strategy is formulated according to the characteristics of the corresponding scene, and reasonable mapping is established between the task nodes and the resource nodes, so that the execution time of the task can be shortened, and the execution efficiency of the task can be improved.
In recent years, with the deep development of cloud computing technology and the popularization of application of business service models, a lot of researchers and scholars have made a lot of research on task scheduling algorithms in a cloud environment. After sorting and induction, the problems of task scheduling research in the cloud environment mainly include: the method comprises the following steps of paying attention to the execution efficiency problem of a task scheduling algorithm, paying attention to the service quality control problem of task scheduling in a cloud computing environment and paying attention to the economic benefit problem of a cloud service provider. These task scheduling algorithms are summarized mainly in the following categories: (1) heuristic task scheduling algorithm: such heuristic task scheduling algorithms mainly include a particle swarm algorithm, an ant colony algorithm, a gravity search algorithm, a genetic algorithm, and the like. Such scheduling algorithms can map tasks to appropriate resources according to changes of resource states during scheduling to complete resource allocation and task execution. On the basis of the algorithm idea, many researches and improvements are made to improve the performance of the algorithm and the applicability under corresponding scenes. (2) Scheduling algorithm based on QoS target constraint condition: the algorithm meets the service quality of user task scheduling as much as possible, and considers the deadline, the scheduling budget, the reliability, the safety and the like during task scheduling as the constraint conditions of scheduling so as to meet the preference requirement of a user when selecting resources. (3) The scheduling algorithm based on the income of the cloud service provider comprises the following steps: the algorithm considers the scheduling problem from the perspective of a cloud service provider, and utilizes resources to the maximum extent or reduces resource consumption on the premise of ensuring the completion of task scheduling, thereby improving the benefit of the provider.
When the task is scheduled, in the process of mapping the task nodes to the resource nodes, the scheduling effect is closely related to the calculation cost and the communication cost of scheduling. For cloud users, it is desirable to obtain a response in a short time after a task is submitted to a cloud platform, and cloud providers need to provide better quality of service using limited resources, so from different perspectives of users and providers, the response time of the task is an important index for task scheduling. In data processing services, in order to shorten the processing time of a task and fully utilize idle resources in a system, a common method is to fragment data related to the task and distribute the data to a plurality of idle resource nodes for execution. However, in the process of data fragmentation, there is no good scheduling strategy to solve the problem of how to fragment, i.e. how many fragments are fragmented, how much data is per fragment, and which resource node is matched with the fragmented data. The conventional data fragmentation method is to perform average fragmentation on data to be processed, and distribute data associated with a task to each resource node for execution after performing average fragmentation according to the number of available resources.
Disclosure of Invention
The invention provides a resource expropriation scheduling method for data fragmentation, which can reasonably fragment big data and perform resource expropriation and can fully utilize idle available resources to perform data processing.
The resource requisition scheduling method for data fragmentation of the invention comprises the following steps:
A. determining task T
iMaximum possible data fragmentation DN that can be split
i:
Wherein i is the number of tasks, D
iFor task T
iAmount of data to be processed, MS
iFor task T
iA minimum resolvable granularity of the data of (a);
B. the method comprises the steps of screening the available resources, selecting m resources from n resource nodes and adding the m resources into a pre-scheduling set, wherein m is DNi;
C. According to a pre-slicing proportionality coefficient lambdaj' determining a scheduling time FFT for each data slice to be allocated to each prescheduled resource node in the prescheduled setij(ii) a The pre-slicing proportionality coefficient is the proportionality coefficient which is averagely distributed or preset for data slicing under the condition of not knowing the computing power and the network transmission capability initially, and then the proportionality coefficient is distributed to each pre-scheduling resource for carrying out the FFT of the following scheduling timeijAnd (4) calculating;
D. according to task TiThe maximum value of the time of completing the task T of each data fragment on each resource node is calculatediIs the final completion time TFTi;
E. By task TiIs the final completion time TFTiCalculating to obtain a task TiCompletion time FF ofiAnd task TiData slice scaling factor lambda ofj;
F. Secondary slicing: obtaining a data slicing proportionality coefficient lambdajThe proportionality coefficient λ with the largest median valuekTo obtain task TiData slicing S ofikActual slice size Dik *;
G. Residual data amount is D
i *=D
i *-D
ik *The remaining number of the divisible pieces is
m is the original number of fragments, and D is calculated
i *0 or m
*If it is 0, completing data slicing and setting the size of other pre-slices to 0, otherwise, setting the maximum scale factor lambda as
kSetting 0, returning to the step F and repeatedly executing;
H. obtaining task T after finishing data fragmentationiActual fragmentation strategy Si*={Si1*,Si2*,…,Sij*,…,Sim} wherein the data is sliced Sij *Has a size of Dij *Each D is obtained by calculationij *A value of (D) ifij *Not equal to 0, the corresponding size is Dij *Data slicing S ofij *To resource nodes RjPerforming data processing on the data if Dij *When equal to 0, the data is sliced Sij *Is empty, no resources are allocated, where j is 1 … m.
According to the invention, through secondary fragmentation, the performance of available resource nodes can be fully utilized in resource expropriation scheduling, and a reasonable data fragmentation strategy is adopted to expropriate reasonable resource nodes for each data fragmentation to perform data processing, so that the overall completion time of a task is shortest, and the execution efficiency of the task is effectively improved.
Specifically, in step C, the FFT of the scheduling time of each prescheduled resource node is calculated
ijComprises the following steps:
wherein D
iFor task T
iAmount of data to be processed, resource node R
jHas a computing power of R
j(C),λ
j' slicing S for data
ijPre-slicing scaling factor, task scheduler and resource node R
jCommunication delay between c
jNetwork performance of R
j(N)。
Specifically, the calculation task T described in step D
iIs the final completion time TFT
iComprises the following steps:
where m is the number of resources that can be invoked, D
iFor task T
iAmount of data to be processed, resource node R
jHas a computing power of R
j(C),λ
j' slicing S for data
ijPre-slicing scaling factor, task scheduler and resource node R
jCommunication delay between c
jNetwork performance of R
j(N)。
In particular, the stepsIn step E by task TiIs the final completion time TFTiCalculating to obtain a task TiCompletion time FF ofiComprises the following steps:
where m is the number of resources that can be invoked, DiFor task TiAmount of data to be processed, resource node RjHas a computing power of Rj(C) Network performance of Rj(N), task scheduler and resource node RjCommunication delay between cjData slicing SijCoefficient of proportionality λjComprises the following steps:
wherein c is
kFor task scheduler and resource node R
kThe time delay of the communication between the two terminals,
R
k(C) is a resource node R
kCalculated performance of R
k(N) is a resource node R
kNetwork transmission performance.
Specifically, the task T described in step F
iData slicing S of
ikActual slice size D
ik *Comprises the following steps:
wherein m is the number of resources that can be invoked.
The resource expropriation scheduling method of the data fragments can fully utilize the performance of available resource nodes in the resource expropriation scheduling, adopts a reasonable data fragment strategy, and expropriates reasonable resource nodes for each data fragment to perform data processing, so that the overall completion time of a task is shortest, and the execution efficiency of the task is effectively improved.
The present invention will be described in further detail with reference to the following examples. This should not be understood as limiting the scope of the above-described subject matter of the present invention to the following examples. Various substitutions and alterations according to the general knowledge and conventional practice in the art are intended to be included within the scope of the present invention without departing from the technical spirit of the present invention as described above.
Detailed Description
As shown in fig. 1, the resource utilization scheduling method for data fragmentation of the present invention includes:
A. determining task T
iMaximum possible data fragmentation DN that can be split
i:
Wherein i is the number of tasks, D
iFor task T
iAmount of data to be processed, MS
iFor task T
iThe minimum resolvable granularity of the data.
B. The method comprises the steps of screening the available resources, selecting m resources from n resource nodes and adding the m resources into a pre-scheduling set, wherein m is DNi。
In a cloud computing environment, after a user submits a task to a cloud platform, a task scheduler in the cloud platform selects appropriate resources for a task queue submitted by the user to execute the task according to a formulated task scheduling strategy. When the cloud environment processes a task with a large data volume, resource utilization is needed, idle dispersed resources in the cloud environment are fully utilized, data related to a data processing type task are divided into a plurality of data fragments, and then the divided data fragments are dispatched to each resource node for processing by using a dispatching strategy, so that the overall execution efficiency of the task is improved. The data slicing task scheduling model is shown in fig. 2.
C. At task TiData slicing S ofijFrom task scheduler to resource node RjIn the transmission process of (2), the transmission duration of the data fragment is obtained by adding the communication delay and the data fragment transmission delay, and the communication delay is set as cjAnd according to resource node RjHas a network performance of Rj(N), available data slices SijTransmission delay FTTijComprises the following steps:
λj' is a pre-slicing scaling factor, when data is sliced SijAt a resource node RjWhen scheduling is carried out, according to the resource node RjComputing power R ofj(C) Available at resource node RjUpper processing data slicing SijTime of FDPTijComprises the following steps:
data slicing SijAt a resource node RjThe time for completing the data scheduling is composed of the transmission time of the data slice and the processing time of the data slice, so that the data slice S is calculatedijTo resource nodes RjScheduling time FFT of up-time data processingij:
Wherein DiFor task TiAmount of data to be processed, resource node RjHas a computing power of Rj(C),λj' slicing S for dataijPre-slicing scaling factor, task scheduler and resource node RjCommunication delay between cjNetwork performance of Rj(N)。
D. The completion time of the task depends on all the data slices in the corresponding task sectionPoint-on-Point processing completion time, task T
iThe final completion time of is task T
iIs done for the maximum value of time on each resource node, so task T
iFinal completion time of
Where m is the number of resources that can be invoked, D
iFor task T
iAmount of data to be processed, resource node R
jHas a computing power of R
j(C),λ
j' slicing S for data
ijPre-slicing scaling factor, task scheduler and resource node R
jCommunication delay between c
jNetwork performance of R
j(N)。
E. And performing strategy optimization and solution. When the data processing type tasks are executed in the cloud environment, the scheduling aims to finish the processing of the task associated data submitted by the user in the shortest time, and the task T is used for
iIs the final completion time TFT
iCalculating to obtain a task T
iCompletion time FF of
iAnd task T
iFractional scaling factor lambda of
jWherein task T
iCompletion time of
Resource node R
jHas a computing power of R
j(C) Network performance of R
j(N), task scheduler and resource node R
jCommunication delay between c
j。
In the solving process of the optimization strategy, the optimization target of the solving problem is firstly converted into a KKT condition (Karush-Kuhn-Tucker, optimization condition), and the solving method of the KKT condition can simplify the solving of the objective function. Specifically, an optimized objective function forms a Lagrangian expression under an inequality constraint condition, the constructed Lagrangian expression is converted into a KKT condition, and the KKT condition is solved to obtain the task T
iData slicing S of
ijData slice scaling factor of
Wherein c is
kFor task scheduler and resource node R
kThe time delay of the communication between the two terminals,
R
k(C) is a resource node R
kCalculated performance of R
k(N) is a resource node R
kNetwork transmission performance.
F. Secondary slicing: finding data slices S according to step EijData slice scaling factor lambda ofjThen, due to the inseparability of the minimum granularity of the data, the size of D cannot be obtained in practical segmentationi*λjThe data fragmentation needs to consider the data shareable granularity, and secondary fragmentation is carried out to obtain a task TiThe actual fragmentation scheduling scheme. The method specifically comprises the following steps:
in the actual slicing process, the current slicing residual data volume is set as Di *The current number of divisible data pieces is m*Then according to task TiAssociated data size and sliceable granularity, with D before the actual slicingi *=Di,m*M. Setting a new scheduling policy to Si*={Si1*,Si2*,…,SimAnd the fragment obtained by the calculation is Sij *(j=1,2,…,m),Sij *Corresponding to a slice size of Dij *Proportional coefficient lambdajAnd (j ═ 1,2, … m) to perform secondary data fragmentation to obtain a new scheduling policy.
Obtaining a scaling factor lambda of a data slice
jThe proportionality coefficient λ with the largest median value
kTo obtain task T
iData slicing S of
ikActual slice size of
G. Residual data amount is D
i *=D
i *-D
ik *The remaining number of the divisible pieces is
m is the original number of fragments, and D is calculated
i *0 or m
*If it is 0, completing data slicing and setting the size of other pre-slices to 0, otherwise, setting the maximum scale factor lambda as
kAnd setting 0, returning to the step F and repeatedly executing.
H. Obtaining task T after finishing data fragmentationiNew actual fragmentation strategy Si*={Si1*,Si2*,…,Sij*,…,Sim} wherein the data is sliced Sij *Has a size of Dij *Each D is obtained by calculationij *A value of (D) ifij *Not equal to 0, the corresponding size is Dij *Data slicing S ofij *To resource nodes RjPerforming data processing on the data if Dij *When equal to 0, the data is sliced Sij *Is empty, no resources are allocated, where j is 1 … m.
The method of the present embodiment (abbreviated as SDSA) and the average sharded task scheduling method (ESDSA) in the prior art are compared by computer simulation, with task response times under different resource node sizes and different task data volumes as performance indicators. The task execution time satisfaction rate expresses the satisfaction degree of the task scheduling algorithm on the user task execution time requirement, and embodies the guarantee degree of the user requirement and the task processing efficiency of the cloud environment.
Fig. 3 shows the corresponding time comparison results of the two methods under different resource node scales, and fig. 4 shows the corresponding time comparison results of the two methods for tasks with different data volumes. According to the comparison result, the method of the invention can better meet the requirement of the user on the task execution time compared with the existing method, and meanwhile, the distribution efficiency and the execution efficiency of the tasks are obviously improved.