[go: up one dir, main page]

0% found this document useful (0 votes)
188 views16 pages

Yarn Tuning Guide

The document provides instructions for configuring a Hadoop cluster. It includes specifying machine configurations like RAM, CPUs, and disks. It then allocates CPU cores and memory to operating systems and Hadoop services like HDFS and YARN. Cluster size and YARN configurations are also set, including container resource limits. Container capacity estimates and configuration checks are performed.

Uploaded by

Asif Shaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
188 views16 pages

Yarn Tuning Guide

The document provides instructions for configuring a Hadoop cluster. It includes specifying machine configurations like RAM, CPUs, and disks. It then allocates CPU cores and memory to operating systems and Hadoop services like HDFS and YARN. Cluster size and YARN configurations are also set, including container resource limits. Container capacity estimates and configuration checks are performed.

Uploaded by

Asif Shaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Machine Configuration

STEP 1: Worker Host Configuration


Enter your likely machine configuration in the input boxes below. If you
are uncertain what machines you plan on buying, put in some minimum
values that will suit what you expect to buy.

Host Components Quantity Size Total Description / Notes


RAM 256G 256G Node memory in Gigabytes
CPU 4 6 48 Number of CPU's and the num
HyperThreading CPU yes Does the CPU support HyperT
HDD (Hard Disk Drive) 24 3T 72G Number of Hard Drives and siz
Ethernet 2 1G 2G Number of Ethernet connectio

STEP 2: Worker Host Planning


Now that you have your base Host configuration from Step 1, use the
table below to allocate resources, mainly CPU and memory, to the various
software components that run on the host.

Memory
Service Category CPU (cores) (MB) Notes
Operating System Overhead 1 8192 Most operating systems use 4-
Other services Overhead 0 0 Enter the required cores or me
Cloudera Manager agent Overhead 1 1024 Allocate 1GB and 1 vcore for C
HDFS DataNode CDH 1 1024 Allocation for the HDFS DataN
YARN NodeManager CDH 1 1024 Allocation for the YARN NodeM
Impala daemon CDH 0 0 (Optional Service) Suggestion:
Hbase RegionServer CDH 0 0 (Optional Service) Suggestion:
Solr Server CDH 0 0 (Optional Service) Suggestion:
Kudu Server CDH 0 0 (Optional Service) Suggestion:
Available Container Resources 44 250880

Container resources

1
Physical Cores to Vcores Multiplier 1 Set this ratio based on the exp
YARN Available Vcores 44 This value will be used in STEP
YARN Available Memory 250880 This value will be used in STEP

STEP 3: Cluster Size


Enter the number of nodes you have (or expect to have) in the cluster
Quantity
Number of Worker Hosts in the cluster 10

2
yes 1G

no 10G
40G

100G

scription / Notes
de memory in Gigabytes
mber of CPU's and the number of HW cores per CPU. The calculation of vcores below includes HyperThreading support.
es the CPU support HyperThreading?
mber of Hard Drives and size per drive in JBOD Configuration
mber of Ethernet connections and the transfer speed

st operating systems use 4-8GB minimum.


er the required cores or memory for non CDH services not part of the OS.
ocate 1GB and 1 vcore for Cloudera Manager agents, which track resource usage on a host.
ocation for the HDFS DataNode heap: default 1GB and 1 vcore.
ocation for the YARN NodeManager heap: default 1GB and 1 vcore
ptional Service) Suggestion: Allocate at least 16GB memory when using Impala.
ptional Service) Suggestion: Allocate no more than 12-16GB memory when using HBase Region Servers.
ptional Service) Suggestion: Minimum 1GB for Solr server. More might be necessary depending on index sizes.
ptional Service) Suggestion: Minimum 1GB for Kudu Tablet server. More might be necessary depending on data sizes.

3
this ratio based on the expected number of concurrent threads in a container per thread core. Default is 1.
s value will be used in STEP 4 for YARN Configuration
s value will be used in STEP 4 for YARN Configuration

4
erThreading support.

index sizes.
ding on data sizes.

5
YARN Configuration
STEP 4: YARN Configuration on Cluster
These are the first set of configuration values for your cluster. You can
set these values in YARN->Configuration

YARN NodeManager Configuration Properties Value Note


yarn.nodemanager.resource.cpu-vcores 44 Copied from STEP 2 "Available R
yarn.nodemanager.resource.memory-mb 250880 Copied from STEP 2 "Available R

STEP 5: Verify YARN Settings on Cluster


Go to the Resource Manager Web UI (usually
http://<ResourceManagerIP>:8088/ and verify the "Memory Total" and
"Vcores Total" matches the values above. If your machine has no bad
nodes, then the numbers should match exactly.

Resource Manager Property to Check Value Note


Expected Value for "Vcores Total" 440 Calculated from STEP 2 "YARN A
Expected Value for "Memory Total" (in GB) 2450 Calculated from STEP 2 "YARN A

STEP 6: Verify Container Settings on Cluster


In order to have YARN jobs run cleanly, you need to configure the
container properties.

YARN Container Configuration Properties (Vcores) Value Description


yarn.scheduler.minimum-allocation-vcores 1 Minimum vcore reservation for
yarn.scheduler.maximum-allocation-vcores 44 Maximum vcore reservation for
yarn.scheduler.increment-allocation-vcores 1 Vcore allocations must be a mul

YARN Container Configuration Properties (Memory) Value Description


yarn.scheduler.minimum-allocation-mb 1024 Minimum memory reservation
yarn.scheduler.maximum-allocation-mb 250880 Maximum memory reservation

6
yarn.scheduler.increment-allocation-mb 512 Memory allocations must be a m

Step 6A: Cluster Container Capacity


This section will tell you the capacity of your cluster (in terms of
containers).

Cluster Container Estimates Minimum Maximum


Max possible number of containers, based on memory configuration 2450
Max possible number of containers, based on vcore configuration 440
Container number based on 2 containers per disk spindles 480
Min possible number of containers, based on memory configuration 10
Min possible number of containers, based on vcore configuration 10

STEP 6B: Container Sanity Checking


This section will do some basic checking of your container parameters in
STEP 6 against the hosts.
Check
Sanity Check Status Description
Scheduler maximum vcores must be larger than minimum GOOD yarn.scheduler.maximum-alloca
Scheduler maximum allocation MB must be larger than minimum GOOD yarn.scheduler.maximum-alloca
Scheduler minimum vcores must be greater than or equal to 0 GOOD yarn.scheduler.minimum-alloca
Scheduler maximum vcores must be greater than or equal to 1 GOOD yarn.scheduler.maximum-alloca
Host vcores must be larger than scheduler minimum vcores GOOD yarn.nodemanager.resource.cp
Host vcores must be larger than scheduler maximum vcores GOOD yarn.nodemanager.resource.cpu
Host allocation MB must be larger than scheduler minimum GOOD yarn.nodemanager.resource.me
Host allocation MB must be larger than scheduler maximum vcores GOOD yarn.nodemanager.resource.me
Small container limit GOOD If yarn.scheduler.minimum-allo

7
pied from STEP 2 "Available Resources"
pied from STEP 2 "Available Resources"

culated from STEP 2 "YARN Available Vcores" and STEP 3


culated from STEP 2 "YARN Available Memory" and STEP 3

nimum vcore reservation for a container


ximum vcore reservation for a container
ore allocations must be a multiple of this value

nimum memory reservation for a container in MegaByte


ximum memory reservation for a container in MegaByte

8
mory allocations must be a multiple of this value in MegaByte

440

10

n.scheduler.maximum-allocation-vcores >= yarn.scheduler.minimum-allocation-vcores


n.scheduler.maximum-allocation-mb >= yarn.scheduler.minimum-allocation-mb
n.scheduler.minimum-allocation-vcores >= 0
n.scheduler.maximum-allocation-vcores >= 1
rn.nodemanager.resource.cpu-vcores >= yarn.scheduler.minimum-allocation-vcores
n.nodemanager.resource.cpu-vcores >= yarn.scheduler.maximum-allocation-vcores
n.nodemanager.resource.memory-mb >= yarn.scheduler.maximum-allocation-mb
n.nodemanager.resource.memory-mb >= yarn.scheduler.minimum-allocation-mb
arn.scheduler.minimum-allocation-mb is less than 1GB, containers will likely get killed by YARN due to OutOfMemory issues

9
to OutOfMemory issues

10
MapReduce Configuration
STEP 7: MapReduce Configuration

For CDH 5.5 and later we recommend that only the heap or the container
size is specified for map and reduce tasks. The value that is not specified
will be calculated based on the setting mapreduce.job.heap.memory-
mb.ratio. This calculation follows Cloudera Manager and calculates the
heap size based on the ratio and the container size.

Application Master Configuration properties Value Description


yarn.app.mapreduce.am.resource.cpu-vcores 1 AM container vcore reservation
yarn.app.mapreduce.am.resource.mb 1024 AM container memory reservati
yarn.app.mapreduce.am.command-opts -Xmx 800 AM Java heap size in MegaByte
Task auto heap sizing
Use task auto heap sizing yes
mapreduce.job.heap.memory-mb.ratio 0.8 Ratio between the container siz
Map Task Configuration properties
mapreduce.map.cpu.vcores 1 Map task vcore reservation
mapreduce.map.memory.mb 1024 Map task memory reservation i
mapreduce.map.java.opts ignored 800 Map task Java heap size in Mega
mapreduce.task.io.sort.mb 400 Spill/Sort memory reservation
ReduceTask Configuration properties 0.48828125
mapreduce.reduce.cpu.vcores 1 Reduce task vcore reservation
mapreduce.reduce.memory.mb 1024 Reduce task memory reservatio
mapreduce.reduce.java.opts ignored 800 Reduce Task Java heap size in M

STEP 7A: MapReduce Sanity Checking


Sanity check MapReduce settings against container minimum/maximum
properties.
Application Master Sanity Checks Value Description
AM vcore request must fit within scheduler limits GOOD yarn.scheduler.minimum-alloca

11
AM memory request must fit within scheduler limits GOOD yarn.scheduler.minimum-alloca
Container size must large enough for java heap and overhead GOOD Java Heap should be between 7

Ratio should be between 0.75 and 0.9 GOOD Java Heap should be between 7

Map Task Sanity Checks Value Description


Map task vcore request must fit within scheduler limits GOOD yarn.scheduler.minimum-alloca
Map task memory request must fit within scheduler limits GOOD yarn.scheduler.minimum-alloca
Container size must large enough for java heap and overhead N/A Java Heap should be between 7
Spill/Sort memory should not use whole map task heap GOOD Make sure that Spill/Sort memo

Reduce Task Sanity Checks Value Description


Reduce task vcore request must fit within scheduler limits GOOD yarn.scheduler.minimum-alloca
Reduce task memory request must fit within scheduler limits GOOD yarn.scheduler.minimum-alloca
Container size must large enough for java heap and overhead N/A Java Heap should be between 7

12
yes

no

M container vcore reservation


M container memory reservation in MegaByte
M Java heap size in MegaByte

tio between the container size and task heap size.

p task vcore reservation


p task memory reservation in MegaByte
p task Java heap size in MegaByte
ll/Sort memory reservation

duce task vcore reservation


duce task memory reservation in MegaByte
duce Task Java heap size in MegaByte

n.scheduler.minimum-allocation-vcores <= yarn.app.mapreduce.am.resource.cpu-vcores <= yarn-scheduler.maximum-allocatio

13
n.scheduler.minimum-allocation-mb <= yarn.app.mapreduce.am.resource.cpu-vcores <= yarn.scheduler.maximum-allocation-m
a Heap should be between 75% and 90% of the container size: too low wastes resources, to high could lead to OOM

a Heap should be between 75% and 90% of the container size: too low wastes resources, to high could lead to OOM

n.scheduler.minimum-allocation-vcores <= mapreduce.map.cpu.vcores <= yarn-scheduler.maximum-allocation-vcores


n.scheduler.minimum-allocation-mb <= mapreduce.map.memory.mb <= yarn.scheduler.maximum-allocation-mb
a Heap should be between 75% and 90% of the container size: too low wastes resources, to high could lead to OOM
ke sure that Spill/Sort memory reservation uses between 40% and 60% of the heap of a map task.

n.scheduler.minimum-allocation-vcores <= mapreduce.reduce.cpu.vcores <= yarn-scheduler.maximum-allocation-vcores


n.scheduler.minimum-allocation-mb <= mapreduce.reduce.memory.mb <= yarn.scheduler.maximum-allocation-mb
a Heap should be between 75% and 90% of the container size: too low wastes resources, to high could lead to OOM

14
scheduler.maximum-allocation-vcores

15
eduler.maximum-allocation-mb
ould lead to OOM

ould lead to OOM

m-allocation-vcores
-allocation-mb
ould lead to OOM

mum-allocation-vcores
um-allocation-mb
ould lead to OOM

16

You might also like