CLUSTERING OF COMPUTERS
Vivek Kapoor*, Rishabh Deshmukh** and Harsh Kara***
*Institute of Engineering & Technology, Devi Ahilya University, Indore
** Institute of Engineering & Technology, Devi Ahilya University, Indore
ABSTRACT: A computer cluster is a group of linked computers, working together closely so that in many respects they
form a single computer. The components of a cluster are commonly, but not always, connected to each other through fast
local area networks. Clusters are usually deployed to improve performance and/or availability over that provided by a
single computer, while typically being much more cost-effective than single computers of comparable speed or
availability. The major objective in the cluster is utilizing a group of processing nodes so as to complete the assigned job
in a minimum amount of time by working cooperatively. The main and vital strategy to achieve such objective is by
transferring the extra loads from busy nodes to idle nodes.
KEYWORDS: Network File System, Secure Shell, GNU Compiler Collection, Clustering, Message Passing Interface.
GENERAL TERMS
NFS (Network File System)
Network File System is a distributed file system protocol permit a client on a client computer to right of entry files over a
network much like home storage is admittance. NFS, like many other protocols, builds on the Open Network Computing
Remote Procedure Call (ONC RPC) system.NFS is often used with UNIX operating systems (such as Solaris, AIX and
HP-UX) and Unix-like operating systems (such as Linux and FreeBSD). It is also obtainable to operating systems such
as the classic Mac OS, OpenVMS, Microsoft Windows, Novell NetWare, and IBM AS/400.
MPICH (Message Passing Interface)
MPICH is a freely accessible, portable execution of MPI, a normal for message-passing for distributed-memory
applications used parallel work out. MPICH is free of charge software and is available for most taste of Unix-like OS
(including Linux and Mac OS X). MPICH is high-performance and extensively convenient execution of the MessagePassing Interface (MPI) standard. MPICH runs on similar system of all sizes, from multi core nodes to group to hefty
super computers.
SSH (Secure Shell)
Secure Shell is a cryptographic (encrypted) network protocol to agree to isolated login and additional network services to
work firmly over an unconfident network. SSH provides a secure channel over an insecure network in a client-server
architecture, connecting an SSH client application with an SSH server. The encryption used by SSH is planned to
provide privacy and integrity of data over an unsecured network. SSH uses public-key cryptography to validate the
remote computer and permit it to authenticate the user, if necessary. There are quite a lot of ways to use SSH; one is to
use routinely generated public-private key pairs to merely encrypt a network association, and then apply password
authentication to log on.
Clustering of Computers 405
GCC (GNU Compiler Collection)
The GNU Compiler Collection (GCC) is a compiler method fashioned by the GNU Project supporting a choice of
programming languages. GCC is a solution constituent of the GNU tool chain. GCC has been ported to a diversity of
processor architecture, and is extensively deployed as a tool in the progress of the both free and proprietary software.
GCC is also accessible for most embedded platforms, including Symbian. The representative compiler of the GNU
operating system, GCC has been adopted as the standard compiler by many other modern Unix-like computer operating
system, including Linux and the BSD family, even though FreeBSD is touching to the LLVM system and OS X has
moved to the LLVM system.
INTRODUCTION
A computer cluster consists of situate of insecurely or firmly linked computers that exertion jointly so that, in many
respects, they can be observation as a single system [5] [11]. Contrasting grid computers, computer clusters have each
node set to carry out the same task, restricted and planned by software.
The components of a cluster are frequently connected to each other in the course of fast local area networks, with every
node (computer used as a server) running its individual illustration of an operating system. In most state of affairs, all of
the nodes use the same hardware and the same operating system, even though in some setups (i.e. using Open Source
Cluster Application Resources), dissimilar operating systems can be used on every computer, and/or different hardware.
They are frequently arrange to get better presentation and ease of use over that of a on its own computer, while
characteristically being much more cost-effective than on its own computer of analogous speed or availability.
A rising choice of possibilities survives for a cluster interconnection technology. Different variables will decide the
network hardware for the cluster. Price-report, bandwidth, latency, and throughput are explanation variables. The
selection of network technology depends on a number of factors, counting price, performance, and compatibility with
additional cluster hardware and system software as well as communication distinctiveness of the applications that will
employ the cluster. The inclination in parallel computing is to move away from conventional focused supercomputing
platforms to cheaper and universal purpose systems consisting of with a loose knot united components built up from
single or multiprocessor PCs or workstations. This approach has a numeral of reward, including being able to build a
platform from a given resources that is appropriate for a hefty class of relevance and workloads.
LITERATURE REVIEW
The last decade has seen a extensive boost in service computing and network recital, mainly as a result of more rapidly
hardware and added complicated software [4]. With the propagation of high recital workstations and the in progress
trends towards high-speed computer networks, network based scattered computing has paying attention a lot of
awareness. The ease of use of commanding microprocessors and high-speed networks as commodity workings has
facilitate lofty performance computing on dispersed systems (wide-area cluster computing). In this milieu, as the
resources are frequently distributed purely at various stage (department, enterprise, or worldwide) there is a great defy in
putting together, coordinating and in attendance them as a solitary reserve to the user; thus outward manifestation a
computational Grid [9]. The collective computing supremacy of a group of general-purpose workstations is analogous to
that of supercomputers [8]. In addition, it has been exposed that the standard consumption of a cluster of workstations is
only around 10% as a result, around 90% of their computing ability is sitting redundant. This un-utilized or tired out
portion of the computing power is extensive and if exploit can endow with a cost-effective substitute to pricey
supercomputing stage. During the hey day of the super computer, right of entry to hardware competent of performing
parallel processing was inadequate and often classy. Most of the composite high-end al1 purpose even devour months
together for generate result. Intricate applications like weather forecasting, Seismic analysis, and Evolutionary
Computational processes command superior computational power for running their programs. Clusters also offer an
outstanding platform for get to the bottom of a choice of parallel and distributed submission in both scientific and
commercial vicinity [1].
METHODOLOGY
To create cluster all nodes should have well-matched operating system which provide prop up for clustering. All nodes
must have some means by which they can employ the collective data. This can be put into practice by means of network
file system (NFS). Next, a message passing interface is desirable for sending instructions flanked by the nodes which can
be done by means of MPICH (MPI-Message Passing Interface, CH-Chameleon). For sending login identification
between nodes using message passing interface we didn’t use TELNET because it sends messages as plain text. So here
we have used SSH (Secure SHell) that sends messages in encrypted form [3]. Most services necessitate reciprocated
authentication before haulage out their functions. This pledge non-reputability and data security on both sides [4]. Now
406 Seventh International Conference on Advances in Computer Engineering - ACE 2016
allocate unique IP address and hostname to all the nodes so that they can be in touch with each other in the network.
After this accumulate the shared folder on all the child nodes so that they can right to use the shared files [12]. A
mechanism is wanted by which the master node can effortlessly login into child nodes. Now the master node can use the
computing power of all the child nodes concomitantly by dividing and arrangement of the assignment among child
nodes.
RESULTS AND DISCUSSION
Following are the results obtained when clustering is performed on two Linux base computers. Out of two computers one
is master and another is slave (child).
Master node manages and schedules the execution of program and performs load balancing.
Case-1 (When programs runs on master node.)
Impact on following resources of Master node:
CPU : By analyzing the report produced by System Monitor utilization of different cores of CPU are:
Core1=30.1%
Core2=19.8%
Core3=29.7%
Core4=14.6%
Average CPU utilization =
(30.1 + 19.8 + 29.7 + 14.6)/4 = 23.55%.
Networks Utilization: The incoming and outgoing data transfer rate is 0 Bytes/second.
Fig. 1.1. When program runs on master node
Case-2 (When programs runs on child node.)
Impact on following resources of Master node:
CPU : By analyzing the report produced by System Monitor utilization of different cores of CPU are:
Core1=17.2%
Core2=14.1%
Core3=16.8%
Core4=18.4%
Average CPU utilization =
(17.2 + 14.1 + 16.8 + 18.4)/4 = 16.625%.
Networks Utilization: The incoming data transfer rate is = 656 KBps.
The outgoing data transfer rate is = 3.7 KBps
Clustering of Computers 407
Fig. 1.2: When program runs on child node.
Comparison Between Two Cases
Here we can see that the CPU utilization of master node is different in both the cases. Also utilization is drastically
reduced when program is running on child node. This proves that the execution of program is done using child
node’s CPU.
Also we can see that the network utilization is much higher when program is running on child node. The change in
incoming data rate is very high with respect to outgoing data rate because the result of program obtained in child
node has to be transferred to the master node. The change in outgoing data rate is very low because only few control
instructions are transferred from master to child node.
LIMITATIONS
Network service failures: In case of network failure whole cluster will fail.
Operational errors: There may be operational errors like improper assignment of IP addresses.
Security of data: Since data is shared among all the node so there may be a chance of security breaches.
Software skewing: There may be some issues like incompatibility of software installed in various nodes.
CONCLUSION AND FUTURE WORK
Clusters are being used to solve many scientific, engineering, and commercial problems. As the demand for computation
power increases day by day, old hardware is not competent of satisfying the growing constraint. This leads to e-waste
which can be significantly abridged by clustering mechanism. Since we know that e-waste is much damaging than usual
waste and plummeting which will be very helpful for the atmosphere. In cluster we make use of assembly of old
hardware to venture a single powerful computer which can be used to carry out tasks that require high computation
power.
Currently many huge international Web portals and e-commerce sites use clusters to process customer desires quickly
and also uphold a high ease of use of 24x7 all the way through the year. The ability of clusters to convey elevated recital
and availability within a on its own environment is empowering many new, accessible and budding applications and
assembly clusters the platform of choice.
There are many exciting areas of expansion in cluster computing. These comprise new ideas as well as mixture of old
ones that are being arrange in production and research scheme. There are endeavor to couple multiple clusters, either
located within one organization or located across multiple organizations forming what is branded as a federated clusters
or hyper clusters [2].
In future we can produce dynamic programs which fine-tune purely according to the recent cluster conditions [10]. For
example definite programs can be made which take into account amount of nodes, there current habit and thus divide the
task into that fraction to maximize the cluster utilization [6] [7].
REFERENCES
[1] Rajkumar Buyya, Hai Jin, Toni Cortes, Cluster Computing. Future generation computer systems, Elsevier, 2002.
[2] Rajkumar Buyya, A Proposal for Creating a Computing Research Repository on Cluster Computing. Monash
University, Melbourne, Australia.
408 Seventh International Conference on Advances in Computer Engineering - ACE 2016
[3] Poonam Dabas, Anoopa Arya, 2003 Grid Computing: An Introduction. UIET kurukshetra university, Haryana,
India.
[4] Rajkumar Buyya, Srikumar Venugopal, 2005 A Gentle Introduction to Grid Computing and Technologies.
Computer Society of India.
[5] Chee Shin Yeo, Rajkumar Buyya, Hossein Pourreza, Rasit Eskicioglu, Peter Graham, Frank Sommers, 2003. Cluster
Computing: High-Performance, High-Availability, and High-Throughput Processing on a Network of Computers.
[6] M. Baker, A. Apon, R. Buyya, and H. Jin, “Cluster Computing and Applications,” Encyclopedia of Computer
Science and Technology, vol. 45 (Supplement 30), A. Kent, and J. Williams (eds.), Marcel Dekker, Jan. 2002, pp.
87-125.
[7] Chee Shin Yeo and Rajkumar Buyya, A Taxonomy of Market-based Resource Management Systems for Utilitydriven Cluster Computing, Software: Practice and Experience (SPE), Volume 36, Issue 13, Pages: 1381-1419, ISSN:
0038-0644, Wiley Press, New York, USA, Nov. 2006.
[8] Mark Baker and Rajkumar Buyya, Cluster Computing: The Commodity Supercomputing, Software: Practice &
Experience, Volume 29, Issue 6, Pages: 551-576, ISSN: 0038-0644, John Wiley & Sons, Inc, New York, USA, May
1999.
[9] Rajkumar Buyya, PARMON: A Portable and Scalable Monitoring System for Clusters, Software: Practice and
Experience, Volume 30, Issue 7, Pages: 723-739, ISSN: 0038-0644, John Wiley & Sons, Inc, New York, USA, June
2000.
[10] Chee Shin Yeo and Rajkumar Buyya, Pricing for Utility-driven Resource Management and Allocation in Clusters,
International Journal of High Performance Computing Applications, Volume 21, Issue 4, Pages: 405-418, ISSN:
1094-3420, SAGE Publications, Thousand Oaks, CA, USA, Nov. 2007.
[11] Mark Baker, Rajkumar Buyya, and Dan Hyde, Cluster Computing: A High-Performance Contender, IEEE
Computer, Volume 32, Issue 7, Pages: 79-80,83, ISSN: 0018-9162, USA, July 1999.
[12] Rajkumar Buyya and Hai Jin, Teaching Parallel Programming on Clusters, A Book Review, IEEE Distributed
Systems (DS Online), IEEE Computer Society Press, Volume 1, Number 2, USA, October 2000.