0% found this document useful (0 votes)

79 views25 pages

Cluster Stack Basics

This document provides an overview of cluster stack basics, including: - A cluster approach uses shared filesystems, job management, dedicated compute nodes, and a consistent environment across nodes interconnected with a low-latency network. - Key components of a cluster include basic network services like NTP and DNS, shared storage like NFS, logging, licensing, databases, and specialized components like a job manager and parallel storage. - The document discusses various aspects of cluster networking, interconnect technologies, parallel filesystems, and cluster management software.

Uploaded by

DanielRomero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views25 pages

Cluster Stack Basics

Uploaded by

DanielRomero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Linux Clusters Ins.

tute:
Cluster Stack Basics
Bre$ Zimmerman, University of Oklahoma
Senior Systems Analyst, OU Supercompu<ng Center for Educa<on and Research (OSCER)

A Bunch of Computers
Users can login to any node
Filesystems arent shared between nodes
Work is run wherever you can nd space
Nodes maintained individually

4-8 August 2014

Whats wrong with a bunch of nodes?

Compe<<on for resources

Size and type of problem is limited

Nodes get out of sync

Problems for users
Diculty in management

4-8 August 2014

Cluster Approach
Shared lesystems
Job management
Nodes dedicated to compute
Consistent environment
Interconnect

4-8 August 2014

Whats right about the cluster approach?

Easier to use
Maximize eciency
Can do bigger and be$er problems
Nodes can be used coopera<vely

4-8 August 2014

The Types of Nodes

Users login here

Compiling
Edi<ng
SubmiTng and Monitoring jobs

Compute

Users might login here

Run jobs as directed by the scheduler

Support

Users dont login here

Do all the other stu

4-8 August 2014

What a cluster needs the mundane

Network services NTP, DNS, DHCP
Shared Storage -- NFS
Logging Consolidated Syslog as a star<ng point
Licensing FlexLM and the like
Database User and Administra<ve Data
Boot/Provisioning PXE, build system
Authen<ca<on LDAP

4-8 August 2014

What a cluster needs -- Specialized

Interconnect An ideally low-latency network
Job manager Resource manager/ scheduler
Parallel Storage Get around the limita<ons of NFS

4-8 August 2014

Network Services

NTP Network Time Protocol, provides clock

synchroniza<on across all nodes in the cluster
DHCP Dynamic Host Congura<on Protocol,
allows central congura<on of host networking
DNS Provides name to address transla<on for the
cluster
NFS Basic UNIX network lesystem

4-8 August 2014

Logging
Syslog

The classic system for UNIX logging

Applica<on has to opt to emit messages

Monitoring

Ac<ve monitoring to catch condi<ons elec<ve

monitoring doesnt catch
Resource manager
Nagios/cac</zabbix/ganglia

IDS

Intrusion detec<on
Monitoring targe<ng misuse/a$acks on the cluster

4-8 August 2014

Basic services, con.nued

Licensing FlexNet/FlexLM or equivalent, mediates access
to a pool of shared licenses.
Database Administra<ve use for logging/monitoring,
dynamic congura<on. Requirements of user so`ware.
Boot/Provisioning For example PXE/Cobbler, PXE/Image
or part of a cluster management suite

4-8 August 2014

Authen.ca.on
Flat les -- passwd, group, shadow entries
NIS -- network access to central at les
LDAP -- Read/Write access to a dynamic tree
structure of account and other informa<on
Host equivalency

4-8 August 2014

Cluster Networking
Hardware Management Lights out management
External Public interfaces to the cluster
Internal General node to node communica<on
Storage Access to network lesystems
Interconnect high-speed, low-latency for mul<-
node jobs
Some of these can share a medium

4-8 August 2014

Interconnect
In the most recent Top 500 list (h$p://top500.org)
there were 224 installa<ons relying on Inniband,
100 using Gigabit Ethernet, and 88 using 10 Gigabit
Ethernet

Ethernet Latency of 50-125 s (GbE), 5-50 s
(10GbE), ~5 s RoCEE
Inniband Latency of 1.3 s (QDR) .7 s (FDR-10/
FDR), .5 s (EDR)

4-8 August 2014

Parallel Filesystem
Lustre - h$p://lustre.org/
PanFS - h$p://www.panasas.com/
GPFS -
h$p://www-03.ibm.com/so`ware/products/en/
so`ware

Parallel lesystems take the general approach of
separa<ng lesystem metadata from the storage.
Lustre and PanFS have dedicated nodes for metadata
(MDS or director blades). GPFS distributes metadata
throughout the cluster
4-8 August 2014

Cluster Management
Automates the building of a cluster
Some way to easily maintain cluster system
consistency
The ability to automate cluster maintenance tasks
Oer some way to monitor cluster health and
performance

4-8 August 2014

Cluster Managemement SoNware

The resource manager knows the state of the various resources on the
cluster and maintains a list of the jobs that are reques<ng resources

The scheduler, using the informa<on from the resource manager
selects jobs from the queue for execu<on

Rocks (h$p://www.rocksclusters.org/wordpress/)
Bright Cluster Manager (
h$p://www.brightcompu<ng.com/Bright-Cluster-Manager)
xCAT (Extreme Cluster/Cloud Administra<on Toolkit) (
h$p://sourceforge.net/p/xcat/wiki/Main_Page/

4-8 August 2014

Congura.on Management
While it is true that boo<ng with a central boot server can make it
easier to make sure the OS on each compute node (or, at least, each
type of compute node) has an iden<cal setup/install, there are s<ll
les which wind up being more dynamic. Some such les are
password/group/shadow and hosts les.

Rsync
Cfengine
Chef
Puppet
Salt

4-8 August 2014

SoNware Installa.on and Management

All linux distros have some sort of package management tool. For
Redhat/CentOS/Scien<c based clusters, this is rpm and yum. Debian
has dpkg and apt

In any case pre-packaged so`ware tends to assume that it is going to
be installed in a specic place on the machine and that it will be the
only version of that so`ware on the machine.

One a cluster, it may be necessary to look at so`ware installa<on
dierently from a standard linux machine

Install to global lesystem
Keep boot image as small as possible
Maintain mul<ple versions

4-8 August 2014

SoNware installa.on and management

There are a couple of tools useful for naviga<ng the dicul<es of
maintaining user environments when dealing with mul<ple versions of
so`ware or so`ware in non-standard loca<ons.
So`Env (h$p://h$p://www.lcrc.anl.gov/info/So`ware/So`env)
Useful for packaging sta<c user environment required by packages
Modules (h$p://modules.sourceforge.net/)
Can be used to make dynamic changes to a users environment.

4-8 August 2014

Resource Manager/Scheduler
Accepts job submissions, maintains a queue of jobs
Allocates nodes/resources and starts jobs on
compute nodes
Schedules wai<ng jobs
Available op<ons
SGE (Sun Grid Engine)
LSF / Openlava (Load Sharing Facility)
PBS (Portable Batch System)
OpenPBS
Torque

SLURM
4-8 August 2014

Best Prac.ces
Here is a quick overview of the general func<ons to
secure a cluster
Risk Avoidance
Deterrence
Preven<on
Detec<on
Recovery

The priority of these will depend on your security
approach
4-8 August 2014

Risk Avoidance
Provide the minimum of services necessary
Grant the least privileges necessary
Install the minimum so`ware necessary
The simpler the environment, the fewer the vectors
available for a$ack.

4-8 August 2014

Deterrence
Limit the discoverability of the cluster
Publish acceptable use policies

Preven.on
Fix known issues (patching)
Congure services for minimal func<onality
Restrict user access and authority
Document ac<ons and changes

4-8 August 2014

Detec.on
Monitor the cluster
Integrate feedback from the users
Set alerts and automated response

Recovery
Backups
Documenta<on
Dene acceptable loss

4-8 August 2014

Cluster Vision
100% (1)
Cluster Vision
25 pages
Sun Cluster
No ratings yet
Sun Cluster
87 pages
Clustering Software Overview
No ratings yet
Clustering Software Overview
27 pages
Sun Cluster
100% (1)
Sun Cluster
87 pages
1 Cluster Computing
No ratings yet
1 Cluster Computing
42 pages
Module 4 Linux Cloud and RTOS
No ratings yet
Module 4 Linux Cloud and RTOS
34 pages
Lotus Domino Cluster
No ratings yet
Lotus Domino Cluster
170 pages
Book Sleha Guide Color en
No ratings yet
Book Sleha Guide Color en
368 pages
Book-Sleha-Guide Color en PDF
No ratings yet
Book-Sleha-Guide Color en PDF
369 pages
SUSE HA Arch Overview
No ratings yet
SUSE HA Arch Overview
26 pages
Clustering Tech Overview
No ratings yet
Clustering Tech Overview
48 pages
SUSE Linux Enterprise High Availability PDF
100% (1)
SUSE Linux Enterprise High Availability PDF
35 pages
Sles Admin Fcs
No ratings yet
Sles Admin Fcs
972 pages
Book-Sleha Color en PDF
No ratings yet
Book-Sleha Color en PDF
394 pages
Cluster Computing
No ratings yet
Cluster Computing
57 pages
Vcs and Oracle Ha
No ratings yet
Vcs and Oracle Ha
167 pages
Linux Scenario - Based Interview Q&A
No ratings yet
Linux Scenario - Based Interview Q&A
25 pages
Suse Linux
100% (1)
Suse Linux
984 pages
Book Sle Reference
No ratings yet
Book Sle Reference
988 pages
Industrial Training
No ratings yet
Industrial Training
17 pages
User Manual
No ratings yet
User Manual
116 pages
22 Clusters Slides
No ratings yet
22 Clusters Slides
61 pages
Linux Clusters Institute: Scheduling
No ratings yet
Linux Clusters Institute: Scheduling
93 pages
2 Node Cluster With Rhel-6 KVM
No ratings yet
2 Node Cluster With Rhel-6 KVM
182 pages
Cluster Computing Overview
No ratings yet
Cluster Computing Overview
11 pages
How Linux Works What Every Superuser Should Know Brian Ward Instant Download
No ratings yet
How Linux Works What Every Superuser Should Know Brian Ward Instant Download
154 pages
HPC Introduction Lecture 2
No ratings yet
HPC Introduction Lecture 2
55 pages
All-Products Esuprt Ser Stor Net Esuprt Ha Cluster Soln Esuprt Ha Cluster Soln Pvaul Emc Iscsi Storage Dell-emc-cx4i-Win-ha-ctr Reference Guide En-Us
No ratings yet
All-Products Esuprt Ser Stor Net Esuprt Ha Cluster Soln Esuprt Ha Cluster Soln Pvaul Emc Iscsi Storage Dell-emc-cx4i-Win-ha-ctr Reference Guide En-Us
76 pages
Cluster 2
No ratings yet
Cluster 2
26 pages
CloudEngineer Syllabus
No ratings yet
CloudEngineer Syllabus
12 pages
SUSE Linux Cluster
No ratings yet
SUSE Linux Cluster
392 pages
02-Chapter 2 - Workstations
No ratings yet
02-Chapter 2 - Workstations
27 pages
Interesting Facts About RAC
No ratings yet
Interesting Facts About RAC
40 pages
Book Administration en
No ratings yet
Book Administration en
389 pages
1 Introduction
No ratings yet
1 Introduction
19 pages
Sun Grid Engine Tutorial
No ratings yet
Sun Grid Engine Tutorial
14 pages
Full Download How Linux Works What Every Superuser Should Know Brian Ward PDF
100% (12)
Full Download How Linux Works What Every Superuser Should Know Brian Ward PDF
49 pages
VCS Building Blocks
No ratings yet
VCS Building Blocks
31 pages
Book Sleha
No ratings yet
Book Sleha
502 pages
SUSE Linux Enterprise Server 10 Installation and Administration
100% (1)
SUSE Linux Enterprise Server 10 Installation and Administration
982 pages
Martin Berger - Oracle Priva
No ratings yet
Martin Berger - Oracle Priva
46 pages
RAC 10g Best Practices On Linux: Roland Knapp RAC Pack
No ratings yet
RAC 10g Best Practices On Linux: Roland Knapp RAC Pack
49 pages
CC - Unit 1
No ratings yet
CC - Unit 1
34 pages
Lecture 8 ICT723
No ratings yet
Lecture 8 ICT723
41 pages
Staimer On PCA X8 Final
No ratings yet
Staimer On PCA X8 Final
10 pages
Practical Guide Rac
No ratings yet
Practical Guide Rac
63 pages
Advance Computing Technology (170704)
No ratings yet
Advance Computing Technology (170704)
106 pages
Windows NLB Vulnerabilities Unveiled
No ratings yet
Windows NLB Vulnerabilities Unveiled
22 pages
Kubernetes Basics and Architecture
No ratings yet
Kubernetes Basics and Architecture
171 pages
Alfresco Sizing Guidelines
No ratings yet
Alfresco Sizing Guidelines
3 pages
Cloud Computing Devops AND Docker
No ratings yet
Cloud Computing Devops AND Docker
83 pages
Nutanix - Pre .NCM MCI - by .VCEplus.83q DEMO
No ratings yet
Nutanix - Pre .NCM MCI - by .VCEplus.83q DEMO
35 pages
Dell EMC Isilon L2 Interview Prep Guide
No ratings yet
Dell EMC Isilon L2 Interview Prep Guide
2 pages
Big Data
No ratings yet
Big Data
63 pages
LUNA: User-space TCP at Cloud Scale
No ratings yet
LUNA: User-space TCP at Cloud Scale
16 pages
Acs104 CC QB
No ratings yet
Acs104 CC QB
15 pages
Top Answers To Spark Interview Questions
No ratings yet
Top Answers To Spark Interview Questions
32 pages
Python Bindings For The Open Source Electromagnetic Simulator Meep
No ratings yet
Python Bindings For The Open Source Electromagnetic Simulator Meep
20 pages
3070 1036 Doc Eks
No ratings yet
3070 1036 Doc Eks
4 pages
Failover Cluster Validation Report
No ratings yet
Failover Cluster Validation Report
76 pages
OCI 2023 Architect Associate 1Z0-1072-23
0% (2)
OCI 2023 Architect Associate 1Z0-1072-23
89 pages
Memory Virtualization
No ratings yet
Memory Virtualization
6 pages
NetScaler 10.5 Clustering
No ratings yet
NetScaler 10.5 Clustering
143 pages
Dump Bigdata
No ratings yet
Dump Bigdata
39 pages
Oda Model Family Tech Brief
No ratings yet
Oda Model Family Tech Brief
12 pages
CiscoLiveUS - CCUC - ILT - LabGuide - Final
No ratings yet
CiscoLiveUS - CCUC - ILT - LabGuide - Final
68 pages
Nagje HighAvailability
No ratings yet
Nagje HighAvailability
12 pages
HPE - Sd00006691en - Us - HPE Morpheus VM Essentials Software Documentation v8.0.9
No ratings yet
HPE - Sd00006691en - Us - HPE Morpheus VM Essentials Software Documentation v8.0.9
241 pages
Everything You Need To Know About Moving From Maximo EAM To Maximo Application Suite
No ratings yet
Everything You Need To Know About Moving From Maximo EAM To Maximo Application Suite
22 pages
p8 451 High Availability
No ratings yet
p8 451 High Availability
192 pages
Take Advantage of New Capabilities in System Center 2016: Bala Rajagopalan Kiran Madnani Neela Syam Kolli
No ratings yet
Take Advantage of New Capabilities in System Center 2016: Bala Rajagopalan Kiran Madnani Neela Syam Kolli
66 pages
Tech Enthusiast's Career Journey
No ratings yet
Tech Enthusiast's Career Journey
4 pages
VXVM Tutorial 1 - Installation Guide
No ratings yet
VXVM Tutorial 1 - Installation Guide
3 pages
HPC User Manual-Updated
No ratings yet
HPC User Manual-Updated
4 pages
1 s2.0 S0167739X11001440 Main
No ratings yet
1 s2.0 S0167739X11001440 Main
14 pages
70-764 Dumps Administering A SQL Database Infrastructure (Beta)
No ratings yet
70-764 Dumps Administering A SQL Database Infrastructure (Beta)
15 pages
Cobalt2 Pandey AnnexA Internal Review
No ratings yet
Cobalt2 Pandey AnnexA Internal Review
22 pages

Cluster Stack Basics

Uploaded by

Cluster Stack Basics

Uploaded by

Linux Clusters Ins.

4-8 August 2014

Whats wrong with a bunch of nodes?

Size and type of problem is limited

Nodes get out of sync

4-8 August 2014

4-8 August 2014

Whats right about the cluster approach?

4-8 August 2014

The Types of Nodes

Users login here

Users might login here

Users dont login here

4-8 August 2014

What a cluster needs the mundane

4-8 August 2014

What a cluster needs -- Specialized

4-8 August 2014

NTP Network Time Protocol, provides clock

4-8 August 2014

The classic system for UNIX logging

Ac<ve monitoring to catch condi<ons elec<ve

4-8 August 2014

Basic services, con.nued

4-8 August 2014

4-8 August 2014

4-8 August 2014

4-8 August 2014

Cluster Managemement SoNware

4-8 August 2014

SoNware Installa.on and Management

SoNware installa.on and management

4-8 August 2014

4-8 August 2014

4-8 August 2014

4-8 August 2014

You might also like