[go: up one dir, main page]

0% found this document useful (0 votes)
90 views16 pages

Algorithm Review MCT MFT ....

The document summarizes literature on job scheduling algorithms for desktop grids. It describes how algorithms are broadly classified based on the organization of the scheduler as centralized, distributed, or hierarchical. It also discusses the classification of algorithms as static or dynamic based on the type of information used for scheduling decisions. Finally, it outlines some general job scheduling heuristics reported in literature, including Min-Min, Max-Min, and economic models that take profit into account.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views16 pages

Algorithm Review MCT MFT ....

The document summarizes literature on job scheduling algorithms for desktop grids. It describes how algorithms are broadly classified based on the organization of the scheduler as centralized, distributed, or hierarchical. It also discusses the classification of algorithms as static or dynamic based on the type of information used for scheduling decisions. Finally, it outlines some general job scheduling heuristics reported in literature, including Min-Min, Max-Min, and economic models that take profit into account.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

CHAPTER-2

LITERATURE SURVEY

We describe the taxonomy and the classification of scheduling algorithms in


a Desktop Grid found in the literature in this chapter. We are interested on the
aspects of node dynamism, un-reliability and heterogeneity of nodes in a Desktop
Grid. We briefly describe and present, in this chapter, the broad classification of job
scheduling algorithms, general job scheduling heuristics, different techniques
existing in the literature to address the node dynamism, un-reliability and
heterogeneity issues in the Desktop Grid.

2.1. ALGORITHMS BASED ON THE ORGANIZATION OF THE


SCHEDULER

Job scheduling algorithms are broadly classified into centralized, distributed,


and hierarchical algorithms [87, 35] based on the organization of the scheduler. The
scheduling decision is taken by the central server in centralized job scheduling. The
central server possesses the global view of the system state. Centralized job
scheduling algorithms suffer from the problems of single point failure and
scalability. The scheduling decisions are taken by all participating nodes in
distributed job scheduling. Here nodes take scheduling decision based on the local
and partial information. Hierarchical job scheduling algorithms combine the merits
of centralized and distributed scheduling algorithms.

Scheduling algorithms are further classified into static and dynamic based on
the type of information used for job scheduling decisions. Static scheduling
algorithms assume that the complete state of the system is known in advance and
accordingly take scheduling decisions. Dynamic job scheduling algorithms take
scheduling decisions without assuming any prior knowledge of the system state.

2.1.1. Centralized Job Scheduling Algorithms

A central coordinator of the grid takes scheduling decision for each job in
centralized scheduling approach. Algorithms described in [52, 36, 23, 60] use a
central job dispatcher/agent to schedule the jobs. The central agent may be a
physical entity [52, 36] or a globally shared file [23, 60] that is accessible to all the
nodes. Javelin++ [57] and Bayanihan [49] are center-based global computing
13
environments in which central server manages computing jobs. The central
coordinator policy is the best policy from the performance point of view in the
absence of contention towards the central coordinator node. For large systems the
coordinator may become a bottleneck limiting the performance benefits of such a
policy. Therefore, consulting central server for each decision is expensive in large
distributed systems [74].

Centralized scheduling strategy exhibits best performance in the absence of


contention towards the central coordinator. This strategy has lower overhead, but
suffers with single point failure, performance bottleneck and scalability.

2.1.2. Distributed Job Scheduling Algorithms

There is no central coordinator for making scheduling decisions in


distributed scheduling. Each node uses a local scheduler that takes scheduling
decision for the jobs. Each node collects load information from other nodes and
obtains an estimate of the system state. The overhead involved in collecting load
information from the nodes can be reduced by sampling only a limited number of
randomly selected nodes [19]. Algorithms like sender-initiated and receiver-initiated
policies [18] use queue length based threshold to determine when a node should
migrate a job or request a job. It has been reported [73, 74] that the distributed
scheduling algorithms are more sensitive to job service times and job transfer times.

2.1.3. Hierarchical Job Scheduling Algorithms

Hierarchical scheduling algorithms combine the merits of centralized and


distributed approaches by minimizing or eliminating the disadvantages of those
policies [35]. The nodes are logically divided into clusters, with a local scheduler for
each site and cluster, and a global scheduler for the entire grid [92] in this approach.
In [7, 93], a cluster manager makes decision within the cluster and the inter cluster
decisions are taken by a Grid manager. The advantage of this approach is different
scheduling policies can be adapted by the local scheduler and the global scheduler to
handle dynamic conditions in the Grid. Authors in [59] evaluated the performance of
hierarchical load sharing in heterogeneous distributed systems. The authors have
14
shown that the hierarchical load sharing exhibited performance similar to centralized
load sharing policy for all the workload and system parameters considered in
their study.

2.2. STATIC VERSUS DYNAMIC SCHEDULING ALGORITHMS

Scheduling algorithms are further classified into static and dynamic


scheduling algorithms [84] based on the type of information used for scheduling
decision. Prior knowledge of the system state is assumed to be available in case of a
static distributed scheduling algorithm. Therefore, each node takes scheduling
decision by having the complete knowledge of the system state. The scheduling
decision is taken with partial knowledge of the system state in case of a dynamic
distributed scheduling algorithm. The dynamic scheduling algorithms obtain the
changing system state either continuously or periodically [54].

2.3. GENERAL JOB SCHEDULING HEURISTICS

This section presents some general job scheduling heuristics reported in the
literature.

Xtreamweb [30] is an example for enterprise Desktop Grid computing


framework that follows a simple scheduling policy such as FIFO for job scheduling.
Authors in [33, 72] used the percentage of CPU idle time as a parameter metric for
job scheduling.

X. He et al. [37, 25] have developed Min-Min algorithm and Max-Min


algorithm. In Min-Min heuristic, the shortest job is considered first for mapping onto
a machine that offers earliest completion time. In Max-Min heuristic, the longest job
is assigned to a node that offers minimum earliest completion time. Max-Min
algorithm performs better than Min-Min algorithm when the number of shorter tasks
are more than longer tasks, because the longer tasks run in parallel to the shorter
tasks and hence decrease the makespan (total time taken to complete a batch of
jobs). These two algorithms are applicable for batch mode scheduling of jobs only.

Maheswaran et al [55] have presented on-line and batch-mode heuristics for


mapping independent tasks onto heterogeneous computing systems. In on-line
mode, each task is considered once for mapping and scheduling. The task need not

15
wait for the next mapping event to occur for scheduling. As soon as a task arrives,
the scheduler maps the task onto a selected machine. The selection of a machine is
done based on the heuristics such as Minimum Completion Time (MCT)/Minimum
Execution Time (MET)/Switching Heuristics/K-Percent Best and Opportunistic Load
Balancing (OLB).

The machine ensures that the task is completed in minimum possible time in
MCT-based scheduling. Here the system is put in a load balancing state. A machine
that offers minimum execution time is selected for task mapping in MET-based
scheduling. This may often lead the system into a load imbalance state. Switching
Heuristics combines both MCT and MET in cyclic fashion based on the load
conditions across the machines. K-Percent Best heuristic picks k-best machines
based on the execution time of the task and the task is assigned to the one that
completes the task in the earliest time. OLB-based scheduling considers a machine
that is currently free and ready for the task mapping. It does not consider the
execution time of the task on that machine.

The independent tasks are collected into a set and is called a meta-task in
batch-mode scheduling. The meta-task is considered for scheduling when the next
mapping event occurs. The batch mode heuristics followed in [55] are Min-Min,
Max-Min, Sufferage Heuristic and Duplex Heuristic. Sufferage Heuristic assigns a
machine to a task that would suffer most, if the particular machine is not assigned to
it. Duplex Heuristic [83] is a combination of Min-Min and Max-Min heuristics that
exploits system conditions and appropriately adapts Min-Min or Max-Min heuristic
for improved performance.

Tabu Search [26] is a solution space search that keeps track of the regions of
the solution space that have been searched already so as to not repeat a search near
these areas. Genetic Algorithm [50], Simulated Annealing [34] and Genetic
Simulated Annealing algorithms [65] are generally used for searching large solution
spaces.

Genetic Algorithm mimics the process of natural evolution and uses


techniques such as inheritance, mutation, selection, and crossover and generates
optimum solution for the problem of selecting target node [50].

16
Simulated Annealing [34] is a global optimization technique for selecting set
of nodes for scheduling a batch of jobs. The annealing scheduler usually finds a
schedule having a better estimated execution time than the ad-hoc greedy scheduling
techniques. The performance of scheduler in simulated annealing depends purely on
the performance model. Therefore, if the performance model is accurate, the
annealing scheduler returns accurate results [34].

Genetic Simulated Annealing (GSA) is a combination of genetic algorithm


(GA) and simulated annealing (SA) techniques. GSA follows the procedure used in
the GA and, in addition, it uses annealing scheduler to generate optimal solutions by
accepting or rejecting alternate solutions from the solution space [65].

Some research projects [69, 75] have taken ‘profit’ into account and applied
economic models in grid resource scheduling. Incentive based scheduling presented
in [91] aims to build a global computational grid in which every participant has
incentive/decentive based on the performance. For example, when the resource
provider causes the job to miss its deadline, some penalty is imposed on the resource
provider.

Dinda P [26] has done significant work on the selection of resources based
on the host load predictions. He has done a comparative study of different
prediction strategies existing in the literature and also demonstrated how to predict
CPU load accurately in a dynamically changing environment. Dinda stated that
neural network based predictions gives better accuracy compared to other statistical
methods like linear regression and polynomial fitting.

Authors in [88] evaluated scalable search methods for the selection of


candidate host for the job. The search methods are, namely, expanding ring search,
random walk search, advertisement-based search and rendezvous point search.

When a client node needs idle cycles, it sends request to its immediate
neighbors in expanding ring search. If neighbors are busy or not enough candidates
are available, it then sends the request to nodes one hop farther. This method is
applicable to the environments where the nodes are spread at different distances to
each other. The client node sends the request to ‘k’ random neighbors and selects the
node with the minimum load in random walk search.

17
In advertisement-based search, when a node joins the system, it sends out its
profile to its neighbors in the limited scope. The neighbors cache this profile along
with the node ID for future use.

A group of dynamically selected rendezvous points are used for information


gathering in rendezvous point search. Hosts advertise their profiles to the nearest
rendezvous point(s) and the client contacts the rendezvous point(s) to locate
available hosts. Choosing the number of rendezvous points and the frequency of
information gathering in both advertisement-based search and rendezvous point
search influences the performance. Because grid is a dynamic environment; the
system state changes continuously.

Generally, the scheduling decisions taken by meta schedulers in fuzzy


systems are based on the knowledge of Fuzzy system. The learning process in fuzzy
systems are more critical. Garcia-Galan et.al [76] proposed swarm fuzzy systems
(SFS) and is based on bio-inspired knowledge acquisition. Simulations results given
by them shows that SFSs could achieve a faster convergence and higher quality with
a reduced number of control parameters compared to Genetic Fuzzy Systems.

Resource discovery and scheduling of jobs is a challenging area in grid


computing. Authors in [47] proposed bee colony based algorithm for resource
discovery. Such discovered resources are reserved for future allocation to jobs. The
advantage of this approach is delay involved in waiting for the required resources for
job execution can be reduced. But there is a possibility of system getting into
deadlock.

Nithyapriya and Krishnamoorthy [63] proposed priority based job scheduling


algorithm for Grid computing. The priority is assigned to both the jobs and
resources. Jobs are prioritized based on its location and resources are prioritized
based on the computing power. The scheduling algorithm schedules the high priority
jobs to the high priority resources such that the jobs are completed at faster rate and
throughput is increased. But there is a possibility of under utilization of spare
capacity of low priority resources.

18
Javid Taheri et. al have proposed [43] a novel heuristic job scheduling
algorithm using Hopfield neural networks. The algorithm performs the scheduling of
jobs along with data replication. Data replication is essential because job execution
requires data available at different resource sites in the Grid. Data replication
reduces the delivery time of all the data files to their dependent jobs. This minimizes
the overall makespan of the jobs and improves the system performance.

2.4. SCHEDULING STRATEGIES TO HANDLE NODE UNRELIABILITY

Fault tolerant, trust and agent-based techniques are gaining popularity in


dealing with the reliability issues of the resources in an unreliable computing
environment. This section outlines these techniques as reported in the literature.

Reliability gives a measure of robustness. The model proposed in [42]


estimates reliability statistically and is based on the node’s prior performance and
behavior. Each worker in the model is assigned with a reliability rating based on the
correctness of the results generated by the worker. The technique adopted by them is
called redundant computation coupled with voting on the returned results. The
server treats, that result which is generated by a majority of the workers (quorum),
as correct, and their reliability rating is increased; for others, it is decreased. BOINC
(Berkeley Open Infrastructure for Network Computing) [2, 20] also adopts the same
technique.

Generally, reliability ratings are learned over time. Bayanihan [49] proposed
majority voting and spot-checking based on credibility-enhanced eager scheduling
algorithm. Higher credibility is given to a volunteer who passes spot-checking more
in that algorithm.

2.4.1. Fault Tolerant Techniques


Node failure is common in a Desktop Grid computing environment.
Volunteer nodes in a Desktop Grid are not dedicated for public execution. The
public execution can be interrupted by the private execution of the high priority node
owner. Such failures may be transient or definitive [50]. In transient failure, the node
disappears for short period of time and then reconnects to the system. When a node
crashes, it never reconnects to the system during that particular session in definitive
19
faults. Node failure during the execution of a job has significant impact on the
performance of the system. In general, When a node fails during the execution of a
job, the nodes adapt different recovery schemes to handle such situations [64]. The
various fault tolerant schemes/policies available in the literature are restart, recover
(resume), relocate and replication policies.

In resume policy, periodic check-pointing is to be done to resume a job from


the point of failure. It is an additional overhead for the nodes. Restart policy does not
require check-pointing; the job will be restarted from the beginning after the node
recovery. In relocation policy, the source node should monitor and checkpoint the
execution status of the job. When node fails, it relocates the job to a different node.
Therefore, each policy has got its own merits and demerits.

i. Restart Policy

In restart scheme, nodes do not remember anything about the execution


status of the task. The task is to be restarted from the beginning after the node
recovery. The divide and conquer applications reported in [40, 68, 71] follow the
restart mechanism for executing the sub-tasks that belong to a main task. Here the
main processor distributes the sub-tasks to other processors. If it fails suddenly, then
the orphans caused by this processor are discarded and the computation is re-done.
The main task result is obtained by combining the results of the subtasks. This is a
simple approach in which the nodes do not have any overhead in maintaining the job
execution status.

Derrick Kondo et al., have applied restart policy for tasks that fail during
execution [14]. The authors have assumed that the number of times a task will restart
after node failure will follow Geometric Distribution and computed the probability
that a task will finish after certain number of attempts on the same host and used this
metric in obtaining the rapid turn-around time. The authors have applied three
scheduling heuristics such as resource prioritization, resource exclusion and task
replication for selecting suitable resource to schedule the task. In Resource
Prioritization, resources are prioritized based on their clock rates. Resource
Exclusion uses applications make span to exclude slow resources and Task
Replication replicates the tasks on multiple hosts such that delays due to task failure

20
at the end of the application can be avoided. Wasting CPU cycles is the drawback of
this method.

ii. Recover Policy

In recover scheme, when a task fails it knows exactly where it stops and
continue from that point after the node recovery. To implement this scheme, the
nodes have to maintain periodic check pointing about the execution status of the job.
Grids systems such as Condor [52] and Cactus [8] uses check pointing to support
cycle stealing and Dynamite [45] uses check pointing to support load balancing.
Gosia et al proposed a recovery mechanism in [31] for divide and conquer
applications that restructure the computation tree from the orphan tasks created by
the processors with a simple broadcasting message. Through that broadcasting
message the job ID and the processor ID is broadcasted and the link between the
restarted parent and the orphan child is restored and thus avoid re-computation.

Authors in [11] proposed issues involved in check pointing that are to be


considered when jobs run in an unreliable computing environment and presented the
mathematical formulation for mean time to complete a job with and without
checkpointing. They also mentioned that, the number of times a task fails before it
finally executes successfully follows geometric distribution. But, the authors have
not addressed the role of repair time in their model though it has got significant
impact on the completion times of the jobs.

The execution time of a task or the length of a task can be chosen either in
probabilistic or deterministic modeling. Probabilistic model uses probability
distribution to determine the task length, Li and Antonia [90] proposed an execution
time distribution for a task graph in heterogeneous computing systems.
Deterministic model uses a fixed execution time for each task. The length of the
execution time can be determined based on the execution history of the
applications [70].

21
iii. Relocation Policy

Nodes in a grid environment are located in a wide spread geographical


locations. Therefore, nodes enter into night-time in the order of the time zones
around the world. So the idle computing times are available on the human time
scale. Authors D. Zhou and Virginia Lo in [21] took the advantage of the
combination of night-zone and day-zone nodes and presented wave scheduler
algorithms namely Migration Immediate, Wave Immediate, Migration Linger
algorithms.

In Migration Immediate, the client schedules the task on to a machine that is


immediately available. When node fails the task is immediately migrated to a
randomly available host. In Wave Immediate, the task is migrated to a night-zone
machine. In Migration Linger, the task is allowed to stick onto the same host even
after its failure for a random amount of time. If the host is still not available even
after the specified time interval, the task then migrates. Fixing the amount of
lingering time will influence on the turn-around time of the job.

iv. Replication Policy:

Congfeng Jiang et al [13] proposed a replication based job scheduling


algorithm that makes use of adaptive job replications to make sure that the job
scheduling decision is secure, reliable and fault tolerant. The algorithm changes the
replication number according to the current security conditions and end user settings
to handle risky and failure prone grids.

Task failures [15] at the end of the application due to unpredictable node
failures cause delay in the execution of the applications. This problem can be
resolved by replication of the task on multiple hosts. Task scheduling on multiple
hosts reduces the probability of task failures or scheduling the task on a faster node.
Task replication encourages redundant computation.

2.4.2. Trust-Based Scheduling Strategies

Trust represents an estimate of how likely a resource fulfils its


QoS commitment. Farag Azzedin and M. Maheswaran proposed the definition
of Trust in [28] as

22
“Trust is the firm belief in the competence of an entity to act as expected such that
this firm belief is not a fixed value associated with the entity but rather it is subject
to the entity’s behavior and applies only within a specific context at a given time”

Trust is a major concern for consumers and producers of services who


participate in the Grid [29]. Some resource consumers may not want their
applications mapped onto resources that are owned and/or managed by entities they
do not trust. Similar concerns apply from the resource producer side as well. Authors
in [29] have presented a mechanism that integrates trust into resource management
such that the allocation process is aware of the security implications. The three
scheduling heuristic algorithms such as minimum completion time, Min-Min
scheduling, Sufferage heuristic algorithms are modified to incorporate the trust
notion and simulations are performed and shown that the performance is improved
by incorporating trust notion in them.

Reputation is something about what is generally said or believed about a


person or things character or standing [5]. They argue that reputation is a mean of
building trust, as one can trust another based on a good reputation. When making
trust-based decisions, entities can rely on others for information pertaining to a
specific entity. For example, if x wants to make decision of whether to have a
transaction with entity y, which is unknown to x, x can rely on the reputation of y.
Therefore, reputation is a measure of trustworthiness, in the sense of reliability.

According to Abdul-Rahman and Hailes [6], Reputation is defined as

“Reputation is an expectation about an agent behavior based on information


about or observation of its past behavior.”

The trust based model proposed in [6] allows entities to decide which other
entities are trustworthy (trust) and also allows entities to tune their trust based on the
understanding of another entity’s recommendations (reputation).

The reputation is computed using a trust rating provided by users of services


through a feedback mechanism. The most widely used reputation based product
selection has proved to be a great asset for online sites such as eBay [4] and Amazon
Auctions [1]. The reputation model is implemented as a centralized rating system so

23
that users can rate and learn about each other’s reputation. For example, an eBay
user can rate its partner on a scale of -1, 0, or +1 after interaction. The ratings are
stored centrally and the reputation value is computed as the sum of those ratings
over six months.

2.4.3. Agent-Based Scheduling Strategies

Volunteers in a Desktop Grid are heterogeneous and possess different


properties like autonomy failure, availability and service time. To exploit the
advantages of these properties, authors in [82] proposed a Mobile Agent based
Adaptive Scheduling Mechanism (MAASM) that group the volunteers based on their
properties. MAASM classifies and construct volunteer groups and exploit mobile
agents to adapt different scheduling, fault tolerant and replication algorithms suitable
for each volunteer group.

A Fault-Tolerant Mobile Agent System Based on the Agent-Dependent


Approach (FATOMAS) is presented in [78] is a spatial-replication based approach
in which the fault tolerant mechanism also travels along with the agent rather than
stick to the platform. The advantage of this approach is that FATOMAS can overlay
on any mobile agent platform without need to change the existing platform. In
addition, the mobile agents can go to places where fault tolerant mechanism is not
supported.

Comparison of fault tolerant and check pointing scheme is discussed in [85].


Authors have made an analysis by varying the parameters like failure rate and
blocking time. It is shown that check pointing scheme shows a very stable
performance and it is sensitive only to the blocking time when the failure rate is high
where as replication scheme is affected by system factors like agent size and number
of replicas. However, by properly selecting some controllable parameter values, this
scheme can also obtain the performance as good as that of the check pointing
scheme.

Agent based techniques are well suited for addressing dynamism in grid
computing environment. However choosing the number of mobile agents, their
locations and type of mechanism to exchange data among them are major issues.

24
2.5. LIMITATIONS IDENTIFIED

The following limitations are identified from the existing algorithms in the
literature.

i. Most of the scheduling heuristics are based on the greedy choices that
depend on the transitory completion times of the jobs. These methods do not
consider the information about the changing environmental variables like
node reliability and dynamically changing load conditions in the Desktop
Grid when job transmission latencies are significant.

ii. Some of the algorithms make scheduling decisions based on the information
periodically collected information from the nodes. Grid is a dynamic
environment; hence one can expect the system load statistics to change
continuously with time. The collected information might become outdated
before the next collection cycle.

iii. Job scheduling to remote nodes involves transmission latency. By the time
the job reaches the remote node, the node might become busy due to new
local and remote jobs and the selected target node may not be the best node
for the job execution.

iv. Migration linger and migration immediate algorithms require a coordinator


to keep track of the progress of the execution of the job at the remote node.

v. Wave immediate algorithm migrates the job to a night zone machine with an
assumption that the night zone machines are lightly loaded compared to day
zone machines. Therefore, it is applicable to environment in which nodes are
available in combination of night zone and day zone.

vi. Resource Prioritization and Resource Exclusion algorithms eliminate the


mapping of jobs on to low speed machines. This makes under utilization of
lightly loaded low speed machines.

vii. Task Replication performs the job execution at multiple nodes. This leads to
redundant computation and thus causes wastage of idle CPU cycles.

25
2.6. OUR SOLUTION APPROACH

Nodes connected in the Desktop Grid are the volunteer participation of


individual user desktops. Job scheduling plays an important role in the effective
utilization of the resources connected in the Desktop Grid. The design of job
scheduling algorithms is to be done by considering the following issues connected to
the Desktop Grid.

 Dynamic: The jobs are generated at each node at random intervals of time.
Therefore, the load conditions in the nodes changes dynamically with time.

 Node Heterogeneity: The nodes connected in the Desktop Grid posses


different clock speeds.

 Task Heterogeneity: The jobs generated in the Desktop Grid are independent
and posses different processing times.

 Un-Reliable Nodes: The nodes connected in the Desktop Grid encounter


volunteer interferences and leave the public execution at the middle of the
job execution without giving prior notice. The nodes have different
volunteering times and time of donation.

 Distributed: There is no central coordinator to schedule the jobs. Each node


posses a local scheduler to schedule the locally generated jobs. Each node
schedules the job based on the recommendations of the local scheduler.

Based on the study conducted in the literature and considering the applicability
of various algorithms we have identified the following techniques to design job
scheduling strategies to deal with the issues connected to the job scheduling
problem.

 k - Random walk search

 Trust

 Neural Network Predictions

k - Random walk search is a search technique proposed in the literature


generally used for randomly choosing a target node for assigning a task. It is a
simple technique for network exploration and suitable for searching unstructured

26
networks. A random walk traverses the nodes connected in a network by moving
from its current node to a neighboring node chosen at random. Only limited
resources are needed for the search because decisions about where to go next is
taken locally. The study conducted in [67] shows that searching by random walks
have superior performance compared to standard approach of searching by flooding
(flooding explores the network by increasing one hop and hence exponentially
increase the coverage space). Random walk possesses simplicity, low-overhead and
robustness to structural changes. It is also observed that the Random walk is a
popular mechanism widely used in locating a resource or service efficiently in the
peer-to-peer networks [39, 61, 62, 12]. Therefore, in our study we have applied the
concept of random walk search and proposed the scheduling strategies like

 Reliable Node Job Scheduling Strategy (RJSS)

 Un-Reliable Node Job Scheduling Strategy (UJSS)

RJSS and UJSS integrate the concept of k-random walk search along with
the load estimation statistics to select the target node for job scheduling. The
detailed study of these techniques is presented in the chapter-3 and chapter-4.

Trustworthiness identification of resource, based on earlier behavior and grid


interaction is an emerging research area. The failure chances reduce greatly when
higher trust resources are chosen. Various trust models are proposed in the literature
to evaluate the trustworthiness of the resources connected in the Grid. Kavitha et al.,
[48] proposed a method based on past interactions and present environment
characteristics based quantitative trust value to select suitable job resource and
eliminates run time failures due to incompatible user-resource pairs. Authors in
Shashi Bhanwar et al [10] proposed a trust model for establishing and evaluating
trust by computing reputation and trustworthiness of the transacting domain on the
basis of number of past transactions and rated feedback score. The experimental
evaluation conducted in various models [29, 10, 32] shows that the chances of task
failure decreases by integrating the trust model with the job scheduling algorithm.
From this inspiration, we have chosen the trust model and integrated the trust model
in our scheduling strategy and proposed Trust Based Job Scheduling Strategy
(TJSS) to deal with the issues connected to job scheduling problem. The detailed
description of TJSS is presented in the Chapter-5.

27
Artificial Neural Network (ANN) tries to mimic the biological brain neural
network into a mathematical model. The primary significance of ANN is the ability
to learn from its environment and to improve its performance through learning. The
neural network is trained from the historical data to capture the hidden dependencies
and that it will be able predict the future from the training. The ability to accurately
predict the host load of a system is significant in computational grids to make
efficient use of shared resources. Host load forecasting in grid/cloud computing
using ANN has become one of the major areas of research in recent years. The
experiments conducted in the literature [86, 56, 24] shows that, the resource
selection using ANN predictions have demonstrated better performance compared to
non-prediction based methods. Therefore, in this work we apply ANN predictions
for forecasting the future load statistics of the nodes based on the past load history.
Knowing the correct load statistics of the node before hand helps in taking more
efficient scheduling decisions. Therefore we have applied ANN predictions to
address the issues connected to job scheduling problem and proposed a Prediction
based Job Scheduling Strategy (PJSS) by considering the capability of ANN in
getting accurate predictions. The detailed description of PJSS is presented in
chapter-6.

A combined Trust and Prediction Based Scheduling Strategy (TPJSS) that


combines the merits of trust and prediction methods is proposed. The detailed
description of TPJSS is presented in chapter-7.

2.7. SUMMARY

Overall, most of the scheduling algorithms described above focused on the


periodically collected load profiles and transitory completion times of the jobs.
Some techniques are application specific and may not be applicable to general
purpose computational environment and some techniques are not scalable.

Taking this into account, in this work we propose some scheduling strategies
that analyze the behavior of the load variations in the node when job transmission
latencies are significant. In the next chapter, we discuss the technique RJSS for
scheduling jobs in a reliable computing environment.

28

You might also like