0% found this document useful (0 votes)

109 views16 pages

Pulsar Ops Guide

Uploaded by

bobolu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

109 views16 pages

Pulsar Ops Guide

Uploaded by

bobolu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

# Planning 3

## About 3
## Hardware 3
### ZooKeeper 3
#### Nodes 4
#### CPU 4
#### Memory 4
#### Disk 4
#### Network 4
### Bookie 4
#### Nodes 5
#### CPU 5
#### Memory 5
#### Disk 6
### Broker 6
#### Nodes 6
#### CPU 6
#### Memory 6
#### Disk 7
### Proxy 7
#### Nodes 7
#### CPU, Memory 7
#### Disk 7
## Co-Location 7
## Java 8
## Network 8
### Port Configuration 8

# Installation 8
## Bare Metal Deployment 9
## Ansible Deployment 9
## Kubernetes Deployment 9
### Helm Chart 9

# Configure 9
## General 9
### JAVA_HOME 9
### PULSAR_MEM 10
### PULSAR_GC 10
## ZooKeeper 11
### dataDir and dataLogDir 11
## BookKeeper 11
### Memory 11
### DB Ledger storage configuration 12
### dbStorage_rocksDB_blockCacheSize 13
### journalDirectory and ledgerDirectories 13
### journalSyncData 14
### Thread Settings 14
## Broker 15
### numHttpServerThreads 15
### Default Backlog Quota Limit and Retention Policy 15
### Delete Inactive Topics 15
### Default Retention Policy 16
### Other Policies 16
## Proxy 16
## Co-Location 16

## Monitor 17
### Metrics Basics 17
#### Where are the Metrics? 17
#### Internal and External Measurements 17
#### Application Health Checks 17
#### Metric Coverage 17
## Broker Metrics 18
## Bookie Metrics 18
## Client Monitoring 18
## Interceptors 18
## Backlog Monitoring 18
## End-to-End Monitoring 18
## Summary 18

# Scale 18

# Upgrade 18

# Administer 18
## Tenant Operations 19
### Understanding Clusters, Roles and Permissions 19
## Namespace Operations 19
### Storage Quota & Backlog 19
### Dispatch Rate 19
### Retention 19
### TTL 19
## Topic Operations 19
## Function Operations 19
## Connector Operations 19
## Partition Operations 19

# Troubleshoot 19

Pulsar Operation Guide

# Capacity Planning
Before deploying a Pulsar cluster, you have to plan the hardware to be used for deploying
Pulsar. This section provides some guidelines on how to plan the hardwares for deploying your
Pulsar cluster.

## About

As a cloud native distributed event streaming system with high performance, Apache Pulsar can
be deployed in the Intel architecture server and major virtualization environments and runs well.
Pulsar supports most of the major hardware networks and Linux operating systems.

## Hardware

### ZooKeeper

ZooKeeper runs in Java, release 1.8 or greater (JDK 8 or greater). It runs as an ensemble of
ZooKeeper servers. Three ZooKeeper servers is the minimum recommended size for an
ensemble, and we also recommend that they run on separate machines.

Pulsar uses ZooKeeper only for periodic coordination and configuration related tasks, *not* for
basic operations. So you can use lighter-weight machines or VMs.
#### Nodes

It is recommended to have 3~5 zookeeper servers.

#### CPU

It is recommended to have 2+ CPU cores.

#### Memory

ZooKeeper is an in-memory consistent data store. So it is critical to have sufficient memory

allocated for ZooKeeper. 2+ GB memory is sufficient for most Pulsar deployments. However if
you are planning to have multi millions of topics in one Pulsar cluster, it is recommended to
have large enough memory allocated for zookeeper machines.

#### Disk

It is recommended to use separate disks to store transaction log and storage:

- an SSD or a HDD with raid controller which has battery backed write cache to store
transaction log;
- A separate disk (or raided disks) with larger capacity to store zookeeper snapshots.

Because each write to zookeeper must be persisted in the transaction log before the client gets
an acknowledgement. Using SSD reduces the zookeeper write latency.

#### Network

A fast and reliable network is critical for zookeeper. Modern data center networking speed of 1
GbE, 10GbE should be sufficient. However if you are expecting to have millions of topics in one
Pulsar cluster, the data size of zookeeper can be multiple GB large. You want to make sure you
have enough bandwidth for zookeeper transferring snapshots when a follower falls behind, so
that it won’t saturate network bandwidth when snapshot transfer starts. So 10 GbE is
recommended in this case.

### Bookie

Bookies runs in Java, release 1.8 or greater (JDK 8 or greater).

#### Nodes

The number of bookies depends on the replication factor (RF) used to configure your topic. It is
always recommended to have at least RF + 1 bookies in your cluster. RF + 1 allows you losing
one bookie and continue functioning.

By default, Pulsar configures RF to 2, which means pulsar stores 2 replicas for the messages. In
that case, you need 3 bookies to tolerate losing 1 bookie without impacting the service
availability.

The `RF` is configured by following settings in `conf/broker.conf` and you can overwrite the
settings by setting the persistence policy at namespace level.

```
# Number of bookies to use when creating a ledger
managedLedgerDefaultEnsembleSize=2

# Number of copies to store for each message

managedLedgerDefaultWriteQuorum=2

# Number of guaranteed copies (acks to wait before write is complete)

managedLedgerDefaultAckQuorum=2
```

The number of bookies you need depends on the replication factor.

#### CPU

Bookies are not CPU intensive processes. For most workload, Bookies are IO-bound. In that
case, 2+ or 4+ CPU cores are good enough. However, if your applications are not able to batch
messages as much as possible, the bookies will eventually become request-bound and
CPU-bound. If it happens, it’s better to have more CPU cores.

#### Memory

Bookie has its own memory management and it uses JVM direct memory extensively. It is
recommended to have enough direct memory allocated for the JVM running bookies. If your
workload is expected to read historic data (aka message backlog) frequently, it is also
recommended to have enough memory allocated for filesystem.
With that being said, more memory is better. But it depends on your traffic, storage retention
and size of the data. 8+ GB memory is a good configuration to start with and is capable for most
of the deployments.

#### Disk

It is recommended to use separate disks for storing journal and storage.

- An SSD or a HDD with raid controller which has battery backed write cache is
recommended for storing journal. It can help reducing the bookie write latency.
- A separate disk or a separate set of disks with large-enough capacity is recommended
for storing bookie storage. If you have a set of disks, you don’t need to raid them. JBOD
is good enough since bookies are able to handle JBOD.

Because each write to bookie will be persisted (via explicit fsync) to journal before the client
gets an ack.

### Broker

#### Nodes

The main dominated factor for planning the number of broker is network bandwidth. You need to
ensure you have enough network bandwidth for supporting your traffic. The minimal number of
brokers can be 1. However it is recommended to have 2+ brokers for high service availability, so
that you are able to tolerate losing brokers.

#### CPU

Brokers are not CPU intensive processes. For most of the workload, Bookies are network
bound. In that case, 2+ or 4+ cpu cores are good enough. However if your applications are not
able to batch messages as much as possible, or they end up having a lot of topics, producers
and consumers, the brokers will eventually become request- and CPU-bound. If this happens,
more CPU cores will be better.

#### Memory

Brokers caches messages for dispatching. If you have relative large number of consumers per
topic, or have large number of topics, consider allocating more memory for broker. The more the
better. 8+ GB memory is a good configuration to start with and is capable for most of the
deployments. If your traffic is relatively low, you can consider allocating smaller amount of
memory.

#### Disk

Brokers are stateless. It doesn’t store any state or data locally. So there is no specific disk
requirements.

### Proxy

Proxy is an optional component for deploying pulsar. You can think proxy is a
pulsar-protocol-aware tcp connection proxy. It basically does topic lookups and forward the
requests to the right owner brokers.

You only need install proxies when your producers and consumers can *NOT* access or are
*NOT* allowed to access brokers directly.

#### Nodes

The main dominated factor for planning the number of proxies is network bandwidth. You need
to ensure you have enough network bandwidth for supporting your traffic. The minimal number
of brokers can be 1. However it is recommended to have 2+ proxies for high service availability,
so that you are able to tolerate losing proxies.

#### CPU, Memory

Proxies are neither CPU nor Memory intensive processes. 2+ CPU cores and 2GB+ memory
are good configuration to start with.

#### Disk

Proxies are stateless. It doesn’t store any state or data locally. So there is no specific disk
requirements.

## Co-Location
Pulsar’s multiple layered architecture is good for scalability and availability. However sometime
you might want to get started with a smaller Pulsar cluster. Then you can consider colocating
*brokers* with *bookies* to reduce the number of machines required for running Pulsar.

## Java
Java 8 is the recommended version to run Pulsar.
Garbage-First (G1) garbage collector is the recommended GC algorithm to use.

## Network

A fast and reliable network is important for performance. Modern datacenter networking speed
of 1 GbE, 10 GbE should be sufficient.

### Port Configuration

Apache Pulsar requires the following network port configuration to run. Based on the Pulsar
deployment in actual environments, the administrator can open relevant ports in the network
side and host side.

| Component | Default Port | Description |

# Installation
After planning the hardwares for your Pulsar cluster, you are ready to choose a deployment
method to deploy Pulsar to your machines. Here are a few deployment options:

- Step-by-step Manual Deployment

- Ansible Deployment
- Kubernetes Deployment

See Configure Section for more details on how to tune and configure a production-grade pulsar
cluster.

# Configure
This section discusses how to configure and tune a production-grade cluster.

## General

`bin/pulsar` and `bin/pulsar-daemon` are using `conf/pulsar_env.sh` for configuring a few

environment variables, such as java location, log directory, jvm memory and GC settings.

Here are a few settings that you can customize.

### JAVA_HOME

If you have multiple java installations on your machine, you can choose to specify the JAVA
installation you would like to use by setting `JAVA_HOME` environment variable in
`conf/pulsar_env.sh` to point to the java installation directory.

### PULSAR_MEM

You should configure `PULSAR_MEM` to allocate memory for running Pulsar components.

- It is recommended to configure `-Xms` to equal to `-Xmx`, so that JVM can allocate the
whole heap memory during initialization.
- Since Pulsar uses direct memory extensively, you should make sure allocating enough
directory memory for JVM. It is recommended to configure `-XX:MaxDirectMemorySize`
to be twice of `-Xmx`.
- Make sure the total size of JVM heap and direct memory does *NOT* exceed the total
available memory on your machine. It is also recommended to leave at least 1~2 GB for
your operating system.

Let’s use a 8GB machine as an example, you can configure your maximum heap memory to be
2GB, directory memory to be 4GB and leave the remaining 2GB for your operating system. So
your `PULSAR_MEM` will be:
```
-Xms2g -Xmx2g -XX:MaxDirectMemorySize=4g
```

`PULSAR_MEM` can be directly overwritten by setting it in environment variables.

### PULSAR_GC

You can configure `PULSAR_GC` to specify the GC algorithm and its settings to be used for
running Pulsar component. Pulsar uses G1 GC algorithm by default. The default value (shown
as follows) is relative good. However you should consider aligning the values of
`-XX:ParallelGCThreads=` and `-XX:ConcGCThreads=` to 1~2 times of the number of cpu cores
of your machine.

For example, if your machine has 4 cpu cores, you can configure `-XX:ParallelGCThreads=`
and `-XX:ConcGCThreads=` to be 4 or 8.

```
-XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled
-XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis
-XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50
-XX:+DisableExplicitGC -XX:-ResizePLAB
```

## ZooKeeper

The default zookeeper configuration (`conf/zookeeper.conf`) provided in pulsar distribution is

good enough for most of Pulsar deployments. However if you want to tune your zookeeper
cluster, here are a few places that you should be taking care of.

### dataDir and dataLogDir

As mentioned in “Planning” section, it is recommended to have separated disks for transaction

log and storage for zookeeper. If you do allocate two disks for this purpose, you should
configure following two settings to point to the right disks. The default `zookeeper.conf` file in
pulsar distribution doesn’t contain `dataLogDir` setting, in which case you can add an entry
`dataLogDir=</path/to/txnlog/dir>` in `zookeeper.conf` file.

```
# the directory where the snapshot is stored.
dataDir=data/zookeeper

# where txlog are written

dataLogDir=data/zookeeper/txlog
```

## BookKeeper

The default bookie configuration (`conf/bookkeeper.conf`) provided in pulsar distribution is good

enough for most of Pulsar deployments. However if you want to tune your bookie machine, here
are a few places that you should be looking into.

### Memory

As mentioned in “Planning” section, the memory on a bookie machine should be allocated for
following places:

- The JVM runs bookie process.

- Filesystem page cache.
- Operation System.

It is always a good practice to reserve at least 1~2 GB memory for your operating system. This
can avoid your operating system killing process when OS itself runs out of memory.

The remaining memory then can be allocated to the JVM for running bookie process and the
filesystem. A simple approach is to allocate ½ of the remaining memory for JVM and the other ½
for the filesystem.

For the memory allocated for JVM, you can allocate ⅓ for heap memory and ⅔ for direct
memory.

Let’s use a 14GB memory machine as an example.

- Reserve 2GB memory for OS. 12GB is left for JVM and filesystem.
- 6GB allocated for JVM and 6GB for filesystem.
- ⅓ of 6GB can be used for JVM heap and ⅔ of 6GB can be used for JVM direct memory.

So you can configure `PULSAR_MEM` as follows in `conf/pulsar_env.sh`:

```
-Xms2g -Xmx2g -XX:MaxDirectMemorySize=4GB
```

### DB Ledger storage configuration

Pulsar uses DbLedgerStorage for storage ledger entries. The DbLedgerStorage is using
RocksDB for storing the index for ledger entries. All the default settings for DbLedgerStorage
(prefixed with `dbStorage_`) are good for most of pulsar deployments when JVM is configured
to have more than 2GB direct memory.

If you are planning to run bookie is a memory-constrainted environment (where the bookie’s
JVM is configured to have less than 2GB direct memory), you should reduce the size of cache
and buffers used in DbLedgerStorage. These settings are:

```
dbStorage_writeCacheMaxSizeMb=512
dbStorage_readAheadCacheMaxSizeMb=256
dbStorage_rocksDB_blockCacheSize=268435456
dbStorage_rocksDB_writeBufferSizeMB=64
```

A set of good principles to follow on configuring DbLedgerStorage are discussed as follows:

- Make `dbStorage_writeCacheMaxSizeMb` twice larger of

`dbStorage_readAheadCacheMaxSizeMb`
- Make `dbStorage_rocksDB_writeBufferSizeMB` ⅛ of
`dbStorage_writeCacheMaxSizeMb`
- Reduce `dbStorage_rocksDB_blockCacheSize` accordingly

### dbStorage_rocksDB_blockCacheSize

`dbStorage_rocksDB_blockCacheSize` is controlling the size of RocksDB block cache.

RocksDB is used for storing the entry index. It is recommended to configure this value big
enough to hold a significant portion of the index database. So it doesn’t have to swap in-and-out
the index entries.

Increase blockCacheSize to 1~2 GB or even higher if you are:

- Keeping data for much longer duration for rewinding access.

- Expecting to have a backlog of messages.
### journalDirectory and ledgerDirectories

As mentioned in “Planning” section, it is recommended to have separated disks for journal and
storage. If you do allocate two (sets of) disks for this purpose, you should configure following
two settings to point to the right disks.

```
journalDirectory=data/bookkeeper/journal
ledgerDirectories=data/bookkeeper/ledgers
```

If you are using multiple SSDs for `journal` disks, you can configure `journalDirectories` to
increase the throughput.

```
journalDirectories=data/bookkeeper/journal1,data/bookkeeper/journal2
```

Even you have *ONLY* one SSD for `journal` disk, you are recommended to create multiple sub
directories on the same journal disk and configure to use `journalDirectories`. This will increase
the write throughput.

```
journalDirectories=data/bookkeeper/journal/dir1,data/bookkeeper/journal/dir2
```

### journalSyncData

By default Bookies persist entries (via explicit fsync) to disks before acknowledging clients. So
the disk fsync latency dominates the bookkeeper client write latency. The fsync latency is
around half millisecond when using an SSD or a HDD with a battery backed write cache.

However if you are using an HDD without a write cache, or an SSD whose fsync latency was
base, or a virtual machine that doesn’t have dedicated disks, you are recommended to turn off
fsync in order to get good latency. Turning off fsync can be done by setting `journalSyncData` to
`false`. You can also set `journalMaxGroupWaitMSec` to a relative high number such as 100ms
to reduce the number of fsyncs that bookie jouranl threads issue, when `journalSyncData` is
disabled.
Please be aware that when `journalSyncData` is disabled, bookkeeper is relying on replication
for data durability. You will be facing data loss when all replicas are gone at the same time.

### Thread Settings

If you have more than 8 CPU cores, or relative small number of CPU cores (such as 1), please
consider adjusting the number threads to be aligned with 1~2 times of the number of CPU cores
in your machine.

```
# Number of threads that should handle write requests. if zero, the writes would
# be handled by netty threads directly.
numAddWorkerThreads=0

# Number of threads that should handle read requests. if zero, the reads would
# be handled by netty threads directly.
numReadWorkerThreads=8

# Number of threads that should be used for high priority requests

# (i.e. recovery reads and adds, and fencing).
numHighPriorityWorkerThreads=8

# The number of threads that should handle journal callbacks

numJournalCallbackThreads=8
```

## Broker

The default broker configuration (`conf/broker.conf`) provided in pulsar distribution is good

enough for most of Pulsar deployments. However if you want to tune your broker installation,
here are a few places that you should be looking into.

### numHttpServerThreads

If your machine only has 1 cpu core, setting `numHttpServerThreads` to be larger than 2.
Otherwise it will hit an issue in Jetty server.

### Default Backlog Quota Limit and Retention Policy

By default Pulsar is configured with 10GB backlog quota limit and `producer_request_hold`
retention policy. These settings are relatively reasonable for a multi-tenant deployment.
However if you are deploying a Pulsar cluster for relatively small number of tenants, you should
consider follows:

- Increase the default backlog quota limit when you are planning to store messages for
much longer duration. Otherwise you can quickly hit backlog quota limit.
- Choose the right retention policy. If you can’t tolerant consumers missing data, you
should choose `producer_request_hold`, however it would cause producers timeout
when the topic hits quota limit. If you can tolerant consumers missing data or want to
make sure produce always succeed, you should choose `consumer_backlog_eviction`,
which pulsar would evict the oldest message from the slowest consumer’s backlog.

Plan this policy correctly before you rollout Pulsar to production.

### Delete Inactive Topics

By default Pulsar deletes inactive topics (no subscription found in the topics) after a given
duration `brokerDeleteInactiveTopicsFrequencySeconds`.

```
# Enable the deletion of inactive topics
brokerDeleteInactiveTopicsEnabled=true

# How often to check for inactive topics

brokerDeleteInactiveTopicsFrequencySeconds=60
```

If you want to avoid a topic being deleted due to inactivity, you can do the following:

- Disable deleting inactive topics by setting `brokerDeleteInactiveTopicsEnabled` to `false.

- Or, configure a proper default retention policy for keeping data a configured period or a
configured data size.

### Default Retention Policy

By default, Pulsar “deletes” messages once they are acknowledged by all subscriptions. If there
are no subscriptions available in the topics, messages are treated as “acknowledged”
immediately. If there are no subscription, the topics are also treated as inactive after a given
duration (see section “Delete Inactive Topics” above).
If you want to keep all messages for a while, you can set a retention policy for a namespace. Or
you can configure a default retention policy for all namespaces. This can be done by setting the
following settings:

```
# Default message retention time
defaultRetentionTimeInMinutes=0

# Default retention size

defaultRetentionSizeInMB=0
```

Data Engineering Interview
No ratings yet
Data Engineering Interview
52 pages
Building Distributed Systems
100% (3)
Building Distributed Systems
73 pages
HBase and ZooKeeper Overview
No ratings yet
HBase and ZooKeeper Overview
96 pages
IBM - IBM Storage Scale 5.1.9 Protocols Quick Overview (2023)
No ratings yet
IBM - IBM Storage Scale 5.1.9 Protocols Quick Overview (2023)
2 pages
IBM Storage Scale 5.1.9 Setup Guide
No ratings yet
IBM Storage Scale 5.1.9 Setup Guide
2 pages
RabbitMQ Production Setup Guide
No ratings yet
RabbitMQ Production Setup Guide
6 pages
Kafka Notes2
No ratings yet
Kafka Notes2
19 pages
Verifying Technical Specifications: System Requirements
No ratings yet
Verifying Technical Specifications: System Requirements
6 pages
System Design
No ratings yet
System Design
56 pages
Infra Req
No ratings yet
Infra Req
3 pages
DIXON OnPremise Hardware Sizing - 2024x 1
No ratings yet
DIXON OnPremise Hardware Sizing - 2024x 1
9 pages
Kafka Cluster Sizing Calculator
No ratings yet
Kafka Cluster Sizing Calculator
6 pages
Server Sizing for ISPs
No ratings yet
Server Sizing for ISPs
5 pages
System Design
No ratings yet
System Design
56 pages
Setup PT Cluster No de
No ratings yet
Setup PT Cluster No de
19 pages
Hadoop Cluster Capacity Planning
No ratings yet
Hadoop Cluster Capacity Planning
9 pages
F5 LTM and GTM Basic Concepts and Operations
No ratings yet
F5 LTM and GTM Basic Concepts and Operations
100 pages
Advanced Kafka Training for Developers
No ratings yet
Advanced Kafka Training for Developers
8 pages
Openstack Setup Diagram
No ratings yet
Openstack Setup Diagram
63 pages
Data Center Setup Guide
No ratings yet
Data Center Setup Guide
3 pages
Websphere Edge Server: Working With Web Traffic Express and Network Dispatcher
No ratings yet
Websphere Edge Server: Working With Web Traffic Express and Network Dispatcher
510 pages
System Design Golden Rules
No ratings yet
System Design Golden Rules
37 pages
AAE Architecture Logs Messages etcGoodLearn
No ratings yet
AAE Architecture Logs Messages etcGoodLearn
52 pages
Alfresco 0 Day PDF
No ratings yet
Alfresco 0 Day PDF
14 pages
MBLogic Manual 2011-04-16
No ratings yet
MBLogic Manual 2011-04-16
329 pages
Design A 1 Petabyte Storage From Scratch Using BeeGfs With A Supermicro Server Recommending Hard Drives For An HPC Use Case
No ratings yet
Design A 1 Petabyte Storage From Scratch Using BeeGfs With A Supermicro Server Recommending Hard Drives For An HPC Use Case
41 pages
Wowza - General Performance Tuning
No ratings yet
Wowza - General Performance Tuning
4 pages
Program Directory For Websphere MQ For Z/Os
No ratings yet
Program Directory For Websphere MQ For Z/Os
68 pages
Day One:: Deploying BGP Rib Sharding and Update Threading
No ratings yet
Day One:: Deploying BGP Rib Sharding and Update Threading
41 pages
F5 Ltm-140130073252-Phpapp
No ratings yet
F5 Ltm-140130073252-Phpapp
149 pages
SmartStruxure Lite BACnet IP Guidelines
No ratings yet
SmartStruxure Lite BACnet IP Guidelines
9 pages
Program Directory For MQ v9
No ratings yet
Program Directory For MQ v9
76 pages
2
No ratings yet
2
2 pages
Set Up A Standalone Pulsar in Docker Apache Pulsar
No ratings yet
Set Up A Standalone Pulsar in Docker Apache Pulsar
5 pages
Openvms Cluster
No ratings yet
Openvms Cluster
354 pages
ManageEngine OpManager System Requirements
No ratings yet
ManageEngine OpManager System Requirements
5 pages
O Minimo Que Voce Precisa Saber Olavo de Carvalho
0% (2)
O Minimo Que Voce Precisa Saber Olavo de Carvalho
282 pages
System Design Concepts
No ratings yet
System Design Concepts
18 pages
Apache ZooKeeper - Mesosphere
No ratings yet
Apache ZooKeeper - Mesosphere
27 pages
Pacemaker 1.1 Clusters From Scratch en US
No ratings yet
Pacemaker 1.1 Clusters From Scratch en US
112 pages
OpenStack Hardware Requirements and Capacity Planning - Servers, CPU and RAM, Part 1 - Stratoscale
No ratings yet
OpenStack Hardware Requirements and Capacity Planning - Servers, CPU and RAM, Part 1 - Stratoscale
6 pages
Design A Distributed Queue
No ratings yet
Design A Distributed Queue
20 pages
Bigip4.5-Bea Autoreg DG
No ratings yet
Bigip4.5-Bea Autoreg DG
10 pages
Topics:: o o o o
No ratings yet
Topics:: o o o o
3 pages
Kafka Performance Tuning
No ratings yet
Kafka Performance Tuning
5 pages
goAML Infrastructure Recommendations - v5
No ratings yet
goAML Infrastructure Recommendations - v5
4 pages
Cluster Planning PDF
No ratings yet
Cluster Planning PDF
10 pages
System Design Interviews
No ratings yet
System Design Interviews
151 pages
Splunk Design
No ratings yet
Splunk Design
8 pages
Geo Network Developer Manual
No ratings yet
Geo Network Developer Manual
113 pages
Module 2
No ratings yet
Module 2
6 pages
Emil Koutanov - Effective Kafka - A Hands-On Guide To Building Robust and Scalable Event-Driven Applications With Code Examples in Java (2021)
100% (4)
Emil Koutanov - Effective Kafka - A Hands-On Guide To Building Robust and Scalable Event-Driven Applications With Code Examples in Java (2021)
394 pages
BPM Performance
No ratings yet
BPM Performance
106 pages
SevOne NMS Port Number Requirements Guide
No ratings yet
SevOne NMS Port Number Requirements Guide
17 pages
Switchdocs Customi
No ratings yet
Switchdocs Customi
30 pages
AkkaJava PDF
No ratings yet
AkkaJava PDF
555 pages
WideFS Technical
No ratings yet
WideFS Technical
20 pages
Geith Instalation Quick-Coupling Exc
50% (2)
Geith Instalation Quick-Coupling Exc
11 pages
X++ Coding Standards
100% (1)
X++ Coding Standards
53 pages
The Quantum Software Lifecycle: Benjamin Weder Johanna Barzen Frank Leymann
No ratings yet
The Quantum Software Lifecycle: Benjamin Weder Johanna Barzen Frank Leymann
8 pages
Mark Scheme (Results) Summer 2012: International GCSE Swahili (4SW0) Paper 01
No ratings yet
Mark Scheme (Results) Summer 2012: International GCSE Swahili (4SW0) Paper 01
12 pages
Aw Viii 3
100% (6)
Aw Viii 3
60 pages
Biophilic Hospital Design Seminar
No ratings yet
Biophilic Hospital Design Seminar
59 pages
IJANS - Format - Phytochemicals Analysis of Various Parts of The Avocado Plant - Persea Americana
No ratings yet
IJANS - Format - Phytochemicals Analysis of Various Parts of The Avocado Plant - Persea Americana
11 pages
Effects of Solar Radiation On Animal Thermoregulation
No ratings yet
Effects of Solar Radiation On Animal Thermoregulation
27 pages
Class VI Science Exam Paper
No ratings yet
Class VI Science Exam Paper
2 pages
Top NIRF Ranked Distance - Online Universities in India 2024
No ratings yet
Top NIRF Ranked Distance - Online Universities in India 2024
4 pages
Manboa Male Enhancement Reviews This Trick Is Going Viral - What Customers Says!
No ratings yet
Manboa Male Enhancement Reviews This Trick Is Going Viral - What Customers Says!
4 pages
Green City Concept As New Paradigm For Planned Urban Growth - A Case of Green City, Jaipur
No ratings yet
Green City Concept As New Paradigm For Planned Urban Growth - A Case of Green City, Jaipur
10 pages
667 Question Paper
No ratings yet
667 Question Paper
2 pages
Parasitic Drag
No ratings yet
Parasitic Drag
4 pages
Unit 8 - HS Part 1
No ratings yet
Unit 8 - HS Part 1
7 pages
Com3004 Salesforcemanagement
No ratings yet
Com3004 Salesforcemanagement
13 pages
Design, Modeling and Simulation of An Electric Vehicle System
No ratings yet
Design, Modeling and Simulation of An Electric Vehicle System
12 pages
Basic Settings For Approval: Short Text
No ratings yet
Basic Settings For Approval: Short Text
27 pages
Ficha Tecnica Lampara Quirurgica L6SC V1.0
No ratings yet
Ficha Tecnica Lampara Quirurgica L6SC V1.0
6 pages
SAP Meter Reading Logic
No ratings yet
SAP Meter Reading Logic
4 pages
Flat Glass System Gi Ard in A
No ratings yet
Flat Glass System Gi Ard in A
32 pages
EcoVille Corporation E Co. USJ R Basak Campus
No ratings yet
EcoVille Corporation E Co. USJ R Basak Campus
92 pages
Grapevine Communication
No ratings yet
Grapevine Communication
2 pages
Engineering Design for Water Network
No ratings yet
Engineering Design for Water Network
6 pages
Chapter Two Review
No ratings yet
Chapter Two Review
4 pages
Void Former SD Filcor Cordek Ramp
No ratings yet
Void Former SD Filcor Cordek Ramp
1 page
Campfire Songs
100% (2)
Campfire Songs
39 pages
Harvey and Penzo - Parenting A Child Who Has Intense Emotions
100% (2)
Harvey and Penzo - Parenting A Child Who Has Intense Emotions
225 pages
Payment of Wages Act, 1936
No ratings yet
Payment of Wages Act, 1936
43 pages
Labour Act Cap 73
No ratings yet
Labour Act Cap 73
69 pages

Pulsar Ops Guide

Uploaded by

Pulsar Ops Guide

Uploaded by

# Planning 3

Pulsar Operation Guide

It is recommended to have 3~5 zookeeper servers.

It is recommended to have 2+ CPU cores.

ZooKeeper is an in-memory consistent data store. So it is critical to have sufficient memory

It is recommended to use separate disks to store transaction log and storage:

Bookies runs in Java, release 1.8 or greater (JDK 8 or greater).

# Number of copies to store for each message

# Number of guaranteed copies (acks to wait before write is complete)

The number of bookies you need depends on the replication factor.

It is recommended to use separate disks for storing journal and storage.

#### CPU, Memory

### Port Configuration

| Component | Default Port | Description |

- Step-by-step Manual Deployment

`bin/pulsar` and `bin/pulsar-daemon` are using `conf/pulsar_env.sh` for configuring a few

Here are a few settings that you can customize.

`PULSAR_MEM` can be directly overwritten by setting it in environment variables.

The default zookeeper configuration (`conf/zookeeper.conf`) provided in pulsar distribution is

### dataDir and dataLogDir

As mentioned in “Planning” section, it is recommended to have separated disks for transaction

# where txlog are written

The default bookie configuration (`conf/bookkeeper.conf`) provided in pulsar distribution is good

- The JVM runs bookie process.

Let’s use a 14GB memory machine as an example.

So you can configure `PULSAR_MEM` as follows in `conf/pulsar_env.sh`:

### DB Ledger storage configuration

A set of good principles to follow on configuring DbLedgerStorage are discussed as follows:

- Make `dbStorage_writeCacheMaxSizeMb` twice larger of

`dbStorage_rocksDB_blockCacheSize` is controlling the size of RocksDB block cache.

Increase blockCacheSize to 1~2 GB or even higher if you are:

- Keeping data for much longer duration for rewinding access.

### Thread Settings

# Number of threads that should be used for high priority requests

# The number of threads that should handle journal callbacks

The default broker configuration (`conf/broker.conf`) provided in pulsar distribution is good

### Default Backlog Quota Limit and Retention Policy

Plan this policy correctly before you rollout Pulsar to production.

### Delete Inactive Topics

# How often to check for inactive topics

- Disable deleting inactive topics by setting `brokerDeleteInactiveTopicsEnabled` to `false.

### Default Retention Policy

# Default retention size

You might also like