[go: up one dir, main page]

Academia.eduAcademia.edu
Haruna Ahmed Abba, Nordin B. Zakaria, Syed Nasir Mehmood Shah, and Anindya.J.Pal High Performance Computing Service Center (HPCC), Universiti Teknologi PETRONAS, Seri Iskandar, 31750 Tronoh, Perak, Malaysia ahmadydee@gmail.com, nordinzakaria@petronas.com.my, nasirsyed.utp@gmail.com, anindyajp@gmail.com ! $ "# %! ! # & ' # ( & # ) ' + ' " * * # # $ ' % # # & # * # ! # , - ! ' ! I. ! ,! -./ 0 12/-0. The real word called “grid” appeared to be initiated in the middle of 1990s for the purpose of representing a proposed distributed computing infrastructure for highly developed science and engineering projects [1]. The objective of grid computing is to combine the computing power involved, with widely distributed resources, as well as to deliver non 9trivial services to users [2]. Furthermore Grid Computing stands out as the principle, occurring for several years of time, simply by concentrating on virtual organizations [3], to be able to share large9scale resources, innovating applications and perhaps acquiring high9 performance orientation. The continual growth and development of communications, in relation to high quality as well as availability, is escalating the interest on grid computing paradigm [4], through which computing resources geographically distributed, can end up being logically coupled with each other operating as a computational unit. In Grid [5] approach is often a new generation technologies put together physical resources along with applications which provide extremely more efficient solutions to sophisticated problems (e.g., scientific, engineering as well as business). There are three main levels of scheduling on a grid. Phase one is resource discovery, which in turn generates a record involving potential resources. Level two consists of accumulating information as regards to those resources as well as selecting the most effective set to correspond to the application requirements. During the last level the task will be executed, consisting of file staging along with cleanup. Typically scheduling challenges tend to be NP9hard [6] problems. The consideration in scheduling is always to accomplish high performance i n g r i d computing [7]. In recent years, lot of researchers have been offered in different types of approaches for dynamic job scheduling in different notions. But ours is based on the concept of software project management, which consist of modules and modules are divided in task referred as jobs. However, the execution of job is based on expected completion time or completion time of job execution. In this paper, we propose a new scheduling algorithm, Prioritized Deadline based scheduling algorithm (PDSA), which has considered the job deadline as the prime attribute for job execution. Grid users are highly interested to execute their jobs in the timely manner under the deadline constraints. Most of the scheduling algorithms have not considered deadline perspective for job execution. PDSA has been proposed to meet the deadline constraints as per the users requirement. Moreover, the system perspective (i.e., minimize the average turnaround time) has also been considered in t h e design of this algorithm. An extensive performance comparison is presented using synthetic workload traces to evaluate the efficiency and robustness of grid scheduling algorithms. The rest of this paper is organized as follows. Section 2 gives an overview on previous s researches in resource scheduling. Section 3 discusses the system design and implementation details of our grid resource scheduling respectively. Section 4 describes experimental results and section 5 concludes the paper. II. 3 / 2( In recent years, many researches have been offered in different types of approaches for dynamic job scheduling in different notions. In a related development, author [8] used Fuzzy C9 Mean and Genetic Algorithms for dynamic job scheduling. His model presents a method of the jobs classifications based primarily on Fuzzy C9Mean algorithm as well as mapping the jobs to the appropriate resources primarily based mainly on Genetic algorithm. However, this approach separates workload data to three classifications based on jobs run9time historical data which proves the optimism, but submission time of job should be considered, because it will be more efficient when user know the time of job submission as well as its finishing time, in other to avoid time delay of execution. In related work by author [10], a static job scheduling algorithm through the use of Fuzzy C9 Mean along with Genetic algorithms appears to have been applied. The following model presents the strategies of allocating jobs to distinct nodes, which has been being developed by using Fuzzy C9Mean algorithm for prediction the characteristics of jobs in which run in Grid environment and Genetic algorithm for jobs allocated to large sharing of resources. Similarly, author [11] approach presented the results of the simulation of Grid environment with regards to jobs allocation to distinct nodes. The results prove the model by using Fuzzy c9mean clustering approach for predicting the characterization of jobs as well as optimization involving jobs scheduling in Grid environment. This kind of prediction and optimization engine provided jobs scheduling base upon historical information. In another study, author [12] presented a fault9tolerant scheduling framework through DIOGENES (”DIstributed Optimal GENEetic algorithm with respect to grid application Scheduling”), of which is mapped on the actual architecture of MedioGRID, a real9time satellite image processing system operating within a Grid environment. The proposed solution provides a fault tolerant mechanism of mapping the image processing applications, on the available resources in MedioGRID clusters and uniform access. While [13] improved particle swarm optimization (PSO) algorithm with discrete coding rule for grid scheduling with regard to the optimization of grid task scheduling problems, as well as optimizes the grid resources allocation. Similarly, [14] implemented a new approach based on particle swarm optimization algorithm in order to resolve a task scheduling challenges in grid. The newl y algorithm is generating an optimal schedule to complete a tasks process within a minimum time frame as well as utilizing the resources in an efficient way. In related work by 15] proposed a novel approach based on hybrid PSO and GELS (GPSO) algorithm in order to resolve grid scheduling challenge in order to attenuate makespan as well as missed task. Furthermore, [16] attempts to present evaluation of recommended GA based scheduling against existing traditional algorithms. The simulation results evidently show how the proposed approach can discover optimized solution. In the work of [9], a cost9based workflow scheduling algorithm was presented in order to minimize the cost of execution while reaching the deadline. A Markov Decision Process approach has been utilized in order to schedule step by step workflow task execution, such that it could possibly find the optimal path among services to execute tasks as well as transfer input or output data. However, to be more efficient, some additional priorities need to be considered, like maximum turnaround time and time delayed when it comes to the rescheduling of unexecuted job. While [17], aims at dealing with the fairness problem by dropping the service time frame error. The algorithm assigns to each task sufficient computational power to complete it within its deadline. The resources that each user gets are proportional to the user’s weight or perhaps a share. Here, scheduling of tasks is based on an error called the Service time error which fairness among users. However, it will be more optimize if priority is given based on minimum time of execution of job not based on individual demand. In another work by author [18], a new job scheduling policy was determined by backfilling (JR9backfilling). The main goals of these policies was to decrease the workload execution time frame, job waiting time, job response time, and average bounded slowdown and to successfully optimize the resource utilization. While [19] approach reduce processing time frame and utilize grid resource adequately. The primary goal is to maximize the resource utilization and reduce processing time frame of jobs. Grid resource selection approach is based on Max Heap Tree (MHT) of which best suits regarding large scale application and root node of MHT is selected for job submission. Project management is the well known area of operation research. H e r e w e a r e p r o p o s i n g a n e w Prioritized Deadline based scheduling algorithm (PDSA) using project management technique. For PDSA is the true application of project management in grid computing. III. 3-. 0 2( We simulate some of the traditional algorithms such as Earliest Deadline First (EDF) Scheduling Algorithm and Round Robin Scheduling Algorithm (RR) as baseline to compare the performance of our newly developed Prioritized Deadline based scheduling algorithm (PDSA) and analyze the results. A. Round Robin Scheduling Algorithm (RR): in this prospective ready queue is maintained as a FIFO queue. A process control block (PCB) of a process submitted to the system is linked to the tail of the queue. The algorithm dispatches processes from the head of the ready queue for execution by the CPU. Processes being executed is preempted based on a time quantum, which is a system defined variable. A preempted process’s PCB is linked to the tail of the ready queue. When a process has completed its task, i.e. before the expiry of the time quantum, it terminates and is deleted from the system. The next process is then dispatched from the head of the ready queue. B. Earliest Deadline First (EDF) Scheduling algorithm is the simplest scheduling and famous algorithm that the earlier the deadline is, the higher the priority is; Processes are dispatched based on minimum deadline on the ready queue. When a process has completed its task it will be terminated and then the next job with minimum deadline will be dispatched from the ready queue. EDF is not efficient, because if two tasks have the same absolute deadlines, it chose one of the two at random (ties can be broken arbitrarily), which result to less fairness between jobs. IV. 0 0 0 3 0 -/(4 2( 13-. Prioritized Deadline based scheduling algorithm (PDSA): This algorithm executes the process with the closest deadline time delay. Based on our algorithm the allocation is carried out for a single processor based on the deadline criteria dependent on minimum time delay of job execution, turnaround time and maximum tardiness. Basic definition of the aforementioned criteria: Let us assume Ji : ith Job; n: the number of jobs; Ti : arrival time of job i; di: deadline of job i; αi: burst time of job i; Ci: Job completion time of job i; TTRi: turnaround time of job i; TTDi: time delay of job i; TTRDi: tardiness of job i; TMax_TRD: maximum tardiness; S9list: Sorted list; V. 13/ . - 21 -0. Our simulator has been used to carry out extensive I. Time delay: Referred to the time difference experimentation using the Windows 7 operating system on an Intel Core4 Duo. We used Grid Workloads Archive LCG between burst time and deadline time. data traces provided by provided by the e9Science Group of Time delay, TTDi 9 α ……………………………………….(1) HEP at Imperial College London for process set generation II. Turnaround time: Referred to the total time taken in our experiments. The simulations of the algorithms have between the submission of job for execution and the generated useful data that has been analyzed. To check the performance of the proposed algorithms, i.e. PDSA return of the completed result. Turnaround time TTRi = Ci 9 Ti ...............................(2) scheduling algorithm, EDF scheduling algorithm and RR scheduling algorithm; we have taken this burst time values in 10, 100 and 1000 showing the heterogeneous demands of Average Turnaround time, user’s jobs, each with different characteristics, and ran them through the simulator. Each process is specified by its CPU burst length, arrival time and priority number. Each = =1 …………………………………………….(3) _ process set has been given a time quantum for simulation. Performance metrics for the CPU scheduling algorithms are III. Maximum tardiness: Referred to the maximum time based on the following factors 9 Average Turnaround Time, and Maximum tardiness. delay between turnaround time and deadline time. Tardiness, TTRDi = di 9 TTRi ....................................(4) Below is the graph derived from PDSA scheduling Therefore, algorithm, EDF scheduling algorithm and RR Scheduling Maximum Tardiness TMax_TRD = Max(TTRD1, algorithm followed by a discussion. Fig.1 shows graphs of TTRD2,......TTRDn)…….……………………..…….. (5) the Average Turnaround Times, and Fig.2 Maximum The algorithm takes the input from users, where as each tardiness, respectively. ∑ job is described by its processID, arrival time, burst time and deadline, then compute the value of time delay for each job by sorting out the jobs on the basis of time delay in ascending order, then selecting the jobs with minimum time delay for execution. If multiple jobs have same time delay value then, it will break the tie by selecting a job from job set on the basis of FCFS, then execute the job at CPU level for its given burst time (i.e. demand) in non preemptive way. Compute the value of turnaround time and tardiness for each job. Compute the average turnaround time each user job and finally compute the maximum tardiness value for jobs to identify the maximum time delay in jobs execution. The compact algorithm is presented below: Algorithm PDSA: Input: pool of jobs with processID, arrival time, CPU time and deadline BEGIN For all processes in the pool Compute the time delay of all processes using Arrange the job list in ascending order based on computed time delay(S9list) if (TTDi = TTDj ) Arrange Ji, Jj based on FCFS while (S9list is not empty) do { Execute the job at CPU level based on demand Compute the value of Turnaround Time using Compute the value of Tardiness using } Compute average turnaround time using Compute the value of Maximum tardiness using End Fig.1 Average Turnaround Time Experiment has been performed by varying workload, by increasing processes from ‘500’ to ‘2000’ in scalable manner. Result has shown maintained performance under dynamic environment. Fig.1 presents the comparative performance analysis of our proposed PDSA with EDF Scheduling Algorithm and RR Scheduling Algorithm for a variety of synthetic workload traces. This figure despites that, PDSA has the best performance as compared to EDF scheduling algorithm and RR scheduling algorithm under variable and scalable workload. conformity together with the established facts and principles that belong to the science of process scheduling therefore we believe that the simulator is really a valuable contribution to the understanding of modern operating systems. In future, we will evaluate and propose a computational scheduling algorithm on grid base on multiple processors and perform detailed comparative performance analysis with other scheduling approaches. 27.063 4 ./ We want to express our gratitude to Dr. Nordin B Zakaria and all HPCC members from Universiti Teknology PETRONAS for their help during the research. We thank the HEP e9Science Group at Imperial College London who provided the LCG data. We also thank Hui Li, the Parallel Workload Archive and the Grid Workloads Archive for their contribution in making the data publicly available. Fig.2 Maximum Tardiness Experiment has been performed by varying workload, by increasing processes from ‘500’ to ‘2000’ in scalable manner. Again the result has shown maintained performance under dynamic environment. Fig.2 presents the comparative performance analysis of our proposed PDSA with EDF scheduling algorithm and RR scheduling algorithm for a variety of synthetic workload traces. This figure despites that, PDSA has the best performance as compared to EDF scheduling algorithm and RR scheduling algorithm under variable and scalable workload. The overall comparative performance analysis has shown that our proposed PDSA is more efficient than EDF scheduling algorithm and RR scheduling algorithm for a variety of synthetic workload traces. This figure despites that, PDSA has the best performance in terms Average Turnaround Time and maximum tardiness of EDF scheduling algorithm and RR scheduling algorithm under variable and scalable workload. 5- 20.231 -0. . ,1/1 60 7 In this paper, a scheduling algorithm for executing jobs on grid systems is proposed. Just like real9life scenarios, we've considered the dynamic arrival of jobs as well as the deadline requirement of each job to be processed. Experiment has been performed by varying workload, by increasing processes from ‘500’ to ‘2000’ in scalable manner. Result has shown maintained performance under dynamic environment. Based on the comparative performance analysis PDSA has shown the best performance as compared to EDF scheduling algorithm and RR scheduling algorithm under variable and scalable workload. We have developed a new simulator using java language to facilitate this research. This has been input simply by extensive experimentation. Various possible input patterns were experimented with all the CPU scheduling algorithms. The overall response from the system has been supervised accordingly. Behavior from the system as well as the experimentation results, afterwards, has been in , .2 [1] Foster, and C. Kesselman, Globus: a metacomputing infrastructure toolkit, International Journal of High Performance ComputingApplications, Vol. 2, pp. 115–128, 1997. [2] F. Dong and S. G. Akl, Scheduling algorithm for grid computing: state of the art and open problems, Technical Report of the Open Issues in Grid Scheduling Workshop, School of Computing, University Kingston, Ontario, January, 2006. [3] Foster I, Kesselman C, Tuecke S. The anatomy of the Grid: Enabling scalable virtual organizations. Inte rnational Journal of Supercomputer Applications 2001. [4] H. Topcuoglu, S. Hariri, and M. Wu, “Performance9 effective and low 9 complexity task scheduling for heterogeneous computing. IEEE transactions on Parallel and Distributed Systems 13,(3): 2609274, March 2002. Middleware for Grid Computing” [5] I. Foster and C. Kesselman, Eds., The Grid 2: Blueprint for a New Computing Infrastructure. San Francisco, CA: Morgan Kaufmann, 2004. [6] Blazewicz, J., Domschke, W., and Pesch, E. (1996). Thejob shop9scheduling problem: Conventional and new solution techniques.EuropeanJournalofOperationalResearch,93:1930. [7] R. Buyya, D. Abramson, and J. Giddy, “Nimrod/G : An Architecture for a Resource Management and Scheduling System in a Global Computational Grid,” Proc. Fourth Int’l Conf. High Performance Computing in Asia9 Pacific Region, 2000 [8] Siriluck Lorpunmanee, Mohd Noor Md Sap and Abdul Hanan Abdullah” fuzzy c9mean and genetic algorithms based scheduling for Independent jobs in computational grid” Jurnal Teknologi Maklumat, Jilid 18, Bil. 2 (December 2006) [9] Siriluck Lorpunmanee, Mohd Noor Md Sap, Abdul Hanan Abdullah and Surat Srinoy” A static jobs scheduling for independent jobs in Grid Environment by using Fuzzy C9Mean and Genetic algorithms ” Proceedings of the Postgraduate Annual Research Seminar 2006 [10] Siriluck Lorpunmanee, Mohd Noor Md Sap and Abdul Hanan Abdullah” Optimalisation of a Job Scheduler in the Grid Environment by Using Fuzzy C9Mean” J. J. Appl. Sci., Vol.9, No. 2 (2007) [11] Florin Pop, Dacian Tudor, Valentin Cristea and Vladimir Cretu” Fault9Tolerant Scheduling Framework for MedioGRID System” EUROCON 2007 The International Conference on “Computer as a Tool” 19 4244908139X/07/$20.00 2007 IEEE , Warsaw, September 9912 [12] BU Yan9ping, ZHOU Wei and YU Jin9shou”An Improved PSO Algorithm and Its Application to Grid Scheduling Problem” 2008 International Symposium on Computer Science and Computational Technology, 97890976959349895/08 © 2008 IEEE [13] Mr. P.Mathiyalagan, U.R.Dhepthie and Dr. S.N.Sivanandam” Grid scheduling using Enhanced PSO algorithm” P.Mathiyalagan et al. / (IJCSE) International Journal on Computer Science and Engineering ,Vol. 02, No. 02, 2010, 1409145 [14] Z. Pooranian, A. Harounabadi , M. Shojafar and J. Mirabedini” Hybrid PSO for Independent Task scheduling in Grid Computing to Decrease Makespan ” 2011 International Conference on Future Information Technology IPCSIT, vol.13 (2011) © (2011) IACSIT Press, Singapore [15] Mrs.Snehal Kamalapur1 and Mrs.Neeta Deshpande” Efficient CPU Scheduling: A Genetic Algorithm based Approach” Ad Hoc and Ubiquitous Computing, 2006. ISAUHC '06. International Symposium On page(s): 206 – 207, 1942449073191/06/©2006IEEE. [16] Jia Yu, Rajkumar Buyya and Chen Khong Tham”Cost9based Scheduling of Scientific Workflow Applications on Utility Grids” Proceedings of the First International Conferen ce on e9Science and Grid Computing (e9Science’05) 0976959244896/05 $20.00 © 2005 IEEE [17] Daphne Lopez, S. V. Kasmir Raja”A Dynamic Error Based Fair Scheduling Algorithm For A Computational Grid” Journal of Theoretical and Applied Information Technology © 2005 9 2009 JATIT. All rights reserved. [18]Ivan Rodero, Francesc Guim and Julita Corbalan” Evaluation of Coordinated Grid Scheduling Strategies” High Performance Computing and Communications, 2009. HPCC '09. 11th IEEE International Conference On page(s): 1 – 10, 97890976959373892 [19] Raksha Sharma, Vishnu Kant Soni, Manoj Kumar Mishra, Prachet Bhuyan and Utpal Chandra Dey” An Agent Based Dynamic Resource Scheduling Model with FCFS9Job Grouping Strategy in Grid Computing” World Academy of Science, Engineering and Tech