[go: up one dir, main page]

0% found this document useful (0 votes)
109 views16 pages

Pulsar Ops Guide

Uploaded by

bobolu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views16 pages

Pulsar Ops Guide

Uploaded by

bobolu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

# Planning 3

## About 3
## Hardware 3
### ZooKeeper 3
#### Nodes 4
#### CPU 4
#### Memory 4
#### Disk 4
#### Network 4
### Bookie 4
#### Nodes 5
#### CPU 5
#### Memory 5
#### Disk 6
### Broker 6
#### Nodes 6
#### CPU 6
#### Memory 6
#### Disk 7
### Proxy 7
#### Nodes 7
#### CPU, Memory 7
#### Disk 7
## Co-Location 7
## Java 8
## Network 8
### Port Configuration 8

# Installation 8
## Bare Metal Deployment 9
## Ansible Deployment 9
## Kubernetes Deployment 9
### Helm Chart 9

# Configure 9
## General 9
### JAVA_HOME 9
### PULSAR_MEM 10
### PULSAR_GC 10
## ZooKeeper 11
### dataDir and dataLogDir 11
## BookKeeper 11
### Memory 11
### DB Ledger storage configuration 12
### dbStorage_rocksDB_blockCacheSize 13
### journalDirectory and ledgerDirectories 13
### journalSyncData 14
### Thread Settings 14
## Broker 15
### numHttpServerThreads 15
### Default Backlog Quota Limit and Retention Policy 15
### Delete Inactive Topics 15
### Default Retention Policy 16
### Other Policies 16
## Proxy 16
## Co-Location 16

## Monitor 17
### Metrics Basics 17
#### Where are the Metrics? 17
#### Internal and External Measurements 17
#### Application Health Checks 17
#### Metric Coverage 17
## Broker Metrics 18
## Bookie Metrics 18
## Client Monitoring 18
## Interceptors 18
## Backlog Monitoring 18
## End-to-End Monitoring 18
## Summary 18

# Scale 18

# Upgrade 18

# Administer 18
## Tenant Operations 19
### Understanding Clusters, Roles and Permissions 19
## Namespace Operations 19
### Storage Quota & Backlog 19
### Dispatch Rate 19
### Retention 19
### TTL 19
## Topic Operations 19
## Function Operations 19
## Connector Operations 19
## Partition Operations 19

# Troubleshoot 19

Pulsar Operation Guide

# Capacity Planning
Before deploying a Pulsar cluster, you have to plan the hardware to be used for deploying
Pulsar. This section provides some guidelines on how to plan the hardwares for deploying your
Pulsar cluster.

## About

As a cloud native distributed event streaming system with high performance, Apache Pulsar can
be deployed in the Intel architecture server and major virtualization environments and runs well.
Pulsar supports most of the major hardware networks and Linux operating systems.

## Hardware

### ZooKeeper

ZooKeeper runs in Java, release 1.8 or greater (JDK 8 or greater). It runs as an ensemble of
ZooKeeper servers. Three ZooKeeper servers is the minimum recommended size for an
ensemble, and we also recommend that they run on separate machines.

Pulsar uses ZooKeeper only for periodic coordination and configuration related tasks, *not* for
basic operations. So you can use lighter-weight machines or VMs.
#### Nodes

It is recommended to have 3~5 zookeeper servers.

#### CPU

It is recommended to have 2+ CPU cores.

#### Memory

ZooKeeper is an in-memory consistent data store. So it is critical to have sufficient memory


allocated for ZooKeeper. 2+ GB memory is sufficient for most Pulsar deployments. However if
you are planning to have multi millions of topics in one Pulsar cluster, it is recommended to
have large enough memory allocated for zookeeper machines.

#### Disk

It is recommended to use separate disks to store transaction log and storage:


- an SSD or a HDD with raid controller which has battery backed write cache to store
transaction log;
- A separate disk (or raided disks) with larger capacity to store zookeeper snapshots.

Because each write to zookeeper must be persisted in the transaction log before the client gets
an acknowledgement. Using SSD reduces the zookeeper write latency.

#### Network

A fast and reliable network is critical for zookeeper. Modern data center networking speed of 1
GbE, 10GbE should be sufficient. However if you are expecting to have millions of topics in one
Pulsar cluster, the data size of zookeeper can be multiple GB large. You want to make sure you
have enough bandwidth for zookeeper transferring snapshots when a follower falls behind, so
that it won’t saturate network bandwidth when snapshot transfer starts. So 10 GbE is
recommended in this case.

### Bookie

Bookies runs in Java, release 1.8 or greater (JDK 8 or greater).


#### Nodes

The number of bookies depends on the replication factor (RF) used to configure your topic. It is
always recommended to have at least ​RF + 1​ bookies in your cluster. ​RF + 1​ allows you losing
one bookie and continue functioning.

By default, Pulsar configures RF to 2, which means pulsar stores 2 replicas for the messages. In
that case, you need 3 bookies to tolerate losing 1 bookie without impacting the service
availability.

The `RF` is configured by following settings in `conf/broker.conf` and you can overwrite the
settings by setting the persistence policy at namespace level.

```
# Number of bookies to use when creating a ledger
managedLedgerDefaultEnsembleSize=2

# Number of copies to store for each message


managedLedgerDefaultWriteQuorum=2

# Number of guaranteed copies (acks to wait before write is complete)


managedLedgerDefaultAckQuorum=2
```

The number of bookies you need depends on the replication factor.

#### CPU

Bookies are not CPU intensive processes. For most workload, Bookies are IO-bound. In that
case, 2+ or 4+ CPU cores are good enough. However, if your applications are not able to batch
messages as much as possible, the bookies will eventually become request-bound and
CPU-bound. If it happens, it’s better to have more CPU cores.

#### Memory

Bookie has its own memory management and it uses JVM direct memory extensively. It is
recommended to have enough direct memory allocated for the JVM running bookies. If your
workload is expected to read historic data (aka message backlog) frequently, it is also
recommended to have enough memory allocated for filesystem.
With that being said, more memory is better. But it depends on your traffic, storage retention
and size of the data. 8+ GB memory is a good configuration to start with and is capable for most
of the deployments.

#### Disk

It is recommended to use separate disks for storing journal and storage.

- An SSD or a HDD with raid controller which has battery backed write cache is
recommended for storing journal. It can help reducing the bookie write latency.
- A separate disk or a separate set of disks with large-enough capacity is recommended
for storing bookie storage. If you have a set of disks, you don’t need to raid them. ​JBOD
is good enough since bookies are able to handle JBOD.

Because each write to bookie will be persisted (via explicit fsync) to journal before the client
gets an ack.

### Broker

#### Nodes

The main dominated factor for planning the number of broker is network bandwidth. You need to
ensure you have enough network bandwidth for supporting your traffic. The minimal number of
brokers can be 1. However it is recommended to have 2+ brokers for high service availability, so
that you are able to tolerate losing brokers.

#### CPU

Brokers are not CPU intensive processes. For most of the workload, Bookies are network
bound. In that case, 2+ or 4+ cpu cores are good enough. However if your applications are not
able to batch messages as much as possible, or they end up having a lot of topics, producers
and consumers, the brokers will eventually become request- and CPU-bound. If this happens,
more CPU cores will be better.

#### Memory

Brokers caches messages for dispatching. If you have relative large number of consumers per
topic, or have large number of topics, consider allocating more memory for broker. The more the
better. 8+ GB memory is a good configuration to start with and is capable for most of the
deployments. If your traffic is relatively low, you can consider allocating smaller amount of
memory.

#### Disk

Brokers are stateless. It doesn’t store any state or data locally. So there is no specific disk
requirements.

### Proxy

Proxy is an optional component for deploying pulsar. You can think proxy is a
pulsar-protocol-aware tcp connection proxy. It basically does topic lookups and forward the
requests to the right owner brokers.

You only need install proxies when your producers and consumers can *NOT* access or are
*NOT* allowed to access brokers directly.

#### Nodes

The main dominated factor for planning the number of proxies is network bandwidth. You need
to ensure you have enough network bandwidth for supporting your traffic. The minimal number
of brokers can be 1. However it is recommended to have 2+ proxies for high service availability,
so that you are able to tolerate losing proxies.

#### CPU, Memory

Proxies are neither CPU nor Memory intensive processes. 2+ CPU cores and 2GB+ memory
are good configuration to start with.

#### Disk

Proxies are stateless. It doesn’t store any state or data locally. So there is no specific disk
requirements.

## Co-Location
Pulsar’s multiple layered architecture is good for scalability and availability. However sometime
you might want to get started with a smaller Pulsar cluster. Then you can consider colocating
*brokers* with *bookies* to reduce the number of machines required for running Pulsar.

## Java
Java 8 is the recommended version to run Pulsar.
Garbage-First (G1) garbage collector is the recommended GC algorithm to use.

## Network

A fast and reliable network is important for performance. Modern datacenter networking speed
of 1 GbE, 10 GbE should be sufficient.

### Port Configuration

Apache Pulsar requires the following network port configuration to run. Based on the Pulsar
deployment in actual environments, the administrator can open relevant ports in the network
side and host side.

| Component | Default Port | Description |


| :--: | :--: | :-- |
| Broker / Proxy | 6650 | Pulsar binary protocol communication port (plain text) |
| Broker / Proxy | 6651 | Pulsar binary protocol communication port (tls encrypted) |
| Broker / Proxy | 8080 | Pulsar admin restful service port (plain text, aka http) |
| Broker / Proxy | 8443 | Pulsar admin restful service port (tls encrypted, aka https) |
| Bookie | 8000 | Bookie admin restful api port (plain text, aka http) |
| Bookie | 3181 | Bookie protocol communication port (plain text or tls encrypted) |
| ZooKeeper | 2181 | ZooKeeper communication port for clients |
| ZooKeeper | 2888 | ZooKeeper communication port for followers |
| ZooKeeper | 3888 | ZooKeeper communication port for leader election |

# Installation
After planning the hardwares for your Pulsar cluster, you are ready to choose a deployment
method to deploy Pulsar to your machines. Here are a few deployment options:

- Step-by-step Manual Deployment


- Ansible Deployment
- Kubernetes Deployment

See ​Configure​ Section for more details on how to tune and configure a production-grade pulsar
cluster.

# Configure
This section discusses how to configure and tune a production-grade cluster.

## General

`bin/pulsar` and `bin/pulsar-daemon` are using `conf/pulsar_env.sh` for configuring a few


environment variables, such as java location, log directory, jvm memory and GC settings.

Here are a few settings that you can customize.

### JAVA_HOME

If you have multiple java installations on your machine, you can choose to specify the JAVA
installation you would like to use by setting `JAVA_HOME` environment variable in
`conf/pulsar_env.sh` to point to the java installation directory.

### PULSAR_MEM

You should configure `PULSAR_MEM` to allocate memory for running Pulsar components.

- It is recommended to configure `-Xms` to equal to `-Xmx`, so that JVM can allocate the
whole heap memory during initialization.
- Since Pulsar uses direct memory extensively, you should make sure allocating enough
directory memory for JVM. It is recommended to configure `-XX:MaxDirectMemorySize`
to be twice of `-Xmx`.
- Make sure the total size of JVM heap and direct memory does *NOT* exceed the total
available memory on your machine. It is also recommended to leave at least 1~2 GB for
your operating system.

Let’s use a 8GB machine as an example, you can configure your maximum heap memory to be
2GB, directory memory to be 4GB and leave the remaining 2GB for your operating system. So
your `PULSAR_MEM` will be:
```
-Xms2g -Xmx2g -XX:MaxDirectMemorySize=4g
```

`PULSAR_MEM` can be directly overwritten by setting it in environment variables.

### PULSAR_GC

You can configure `PULSAR_GC` to specify the GC algorithm and its settings to be used for
running Pulsar component. Pulsar uses G1 GC algorithm by default. The default value (shown
as follows) is relative good. However you should consider aligning the values of
`-XX:ParallelGCThreads=` and `-XX:ConcGCThreads=` to 1~2 times of the number of cpu cores
of your machine.

For example, if your machine has 4 cpu cores, you can configure `-XX:ParallelGCThreads=`
and `-XX:ConcGCThreads=` to be 4 or 8.

```
-XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled
-XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis
-XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50
-XX:+DisableExplicitGC -XX:-ResizePLAB
```

## ZooKeeper

The default zookeeper configuration (`conf/zookeeper.conf`) provided in pulsar distribution is


good enough for most of Pulsar deployments. However if you want to tune your zookeeper
cluster, here are a few places that you should be taking care of.

### dataDir and dataLogDir

As mentioned in “Planning” section, it is recommended to have separated disks for transaction


log and storage for zookeeper. If you do allocate two disks for this purpose, you should
configure following two settings to point to the right disks. The default `zookeeper.conf` file in
pulsar distribution doesn’t contain `dataLogDir` setting, in which case you can add an entry
`dataLogDir=</path/to/txnlog/dir>` in `zookeeper.conf` file.

```
# the directory where the snapshot is stored.
dataDir=data/zookeeper

# where txlog are written


dataLogDir=data/zookeeper/txlog
```

## BookKeeper

The default bookie configuration (`conf/bookkeeper.conf`) provided in pulsar distribution is good


enough for most of Pulsar deployments. However if you want to tune your bookie machine, here
are a few places that you should be looking into.

### Memory

As mentioned in “Planning” section, the memory on a bookie machine should be allocated for
following places:

- The JVM runs bookie process.


- Filesystem page cache.
- Operation System.

It is always a good practice to reserve at least 1~2 GB memory for your operating system. This
can avoid your operating system killing process when OS itself runs out of memory.

The remaining memory then can be allocated to the JVM for running bookie process and the
filesystem. A simple approach is to allocate ½ of the remaining memory for JVM and the other ½
for the filesystem.

For the memory allocated for JVM, you can allocate ⅓ for heap memory and ⅔ for direct
memory.

Let’s use a 14GB memory machine as an example.

- Reserve 2GB memory for OS. 12GB is left for JVM and filesystem.
- 6GB allocated for JVM and 6GB for filesystem.
- ⅓ of 6GB can be used for JVM heap and ⅔ of 6GB can be used for JVM direct memory.

So you can configure `PULSAR_MEM` as follows in `conf/pulsar_env.sh`:

```
-Xms2g -Xmx2g -XX:MaxDirectMemorySize=4GB
```

### DB Ledger storage configuration

Pulsar uses DbLedgerStorage for storage ledger entries. The DbLedgerStorage is using
RocksDB for storing the index for ledger entries. All the default settings for DbLedgerStorage
(prefixed with `dbStorage_`) are good for most of pulsar deployments when JVM is configured
to have more than 2GB direct memory.

If you are planning to run bookie is a memory-constrainted environment (where the bookie’s
JVM is configured to have less than 2GB direct memory), you should reduce the size of cache
and buffers used in DbLedgerStorage. These settings are:

```
dbStorage_writeCacheMaxSizeMb=512
dbStorage_readAheadCacheMaxSizeMb=256
dbStorage_rocksDB_blockCacheSize=268435456
dbStorage_rocksDB_writeBufferSizeMB=64
```

A set of good principles to follow on configuring DbLedgerStorage are discussed as follows:

- Make `dbStorage_writeCacheMaxSizeMb` twice larger of


`dbStorage_readAheadCacheMaxSizeMb`
- Make `dbStorage_rocksDB_writeBufferSizeMB` ⅛ of
`dbStorage_writeCacheMaxSizeMb`
- Reduce `dbStorage_rocksDB_blockCacheSize` accordingly

### dbStorage_rocksDB_blockCacheSize

`dbStorage_rocksDB_blockCacheSize` is controlling the size of RocksDB block cache.


RocksDB is used for storing the entry index. It is recommended to configure this value big
enough to hold a significant portion of the index database. So it doesn’t have to swap in-and-out
the index entries.

Increase blockCacheSize to 1~2 GB or even higher if you are:

- Keeping data for much longer duration for rewinding access.


- Expecting to have a backlog of messages.
### journalDirectory and ledgerDirectories

As mentioned in “Planning” section, it is recommended to have separated disks for journal and
storage. If you do allocate two (sets of) disks for this purpose, you should configure following
two settings to point to the right disks.

```
journalDirectory=data/bookkeeper/journal
ledgerDirectories=data/bookkeeper/ledgers
```

If you are using multiple SSDs for `journal` disks, you can configure `journalDirectories` to
increase the throughput.

```
journalDirectories=data/bookkeeper/journal1,data/bookkeeper/journal2
```

Even you have *ONLY* one SSD for `journal` disk, you are recommended to create multiple sub
directories on the same journal disk and configure to use `journalDirectories`. This will increase
the write throughput.

```
journalDirectories=data/bookkeeper/journal/dir1,data/bookkeeper/journal/dir2
```

### journalSyncData

By default Bookies persist entries (via explicit fsync) to disks before acknowledging clients. So
the disk fsync latency dominates the bookkeeper client write latency. The fsync latency is
around half millisecond when using an SSD or a HDD with a battery backed write cache.

However if you are using an HDD without a write cache, or an SSD whose fsync latency was
base, or a virtual machine that doesn’t have dedicated disks, you are recommended to turn off
fsync in order to get good latency. Turning off fsync can be done by setting `journalSyncData` to
`false`. You can also set `journalMaxGroupWaitMSec` to a relative high number such as 100ms
to reduce the number of fsyncs that bookie jouranl threads issue, when `journalSyncData` is
disabled.
Please be aware that when `journalSyncData` is disabled, bookkeeper is relying on replication
for data durability. You will be facing data loss when all replicas are gone at the same time.

### Thread Settings

If you have more than 8 CPU cores, or relative small number of CPU cores (such as 1), please
consider adjusting the number threads to be aligned with 1~2 times of the number of CPU cores
in your machine.

```
# Number of threads that should handle write requests. if zero, the writes would
# be handled by netty threads directly.
numAddWorkerThreads=0

# Number of threads that should handle read requests. if zero, the reads would
# be handled by netty threads directly.
numReadWorkerThreads=8

# Number of threads that should be used for high priority requests


# (i.e. recovery reads and adds, and fencing).
numHighPriorityWorkerThreads=8

# The number of threads that should handle journal callbacks


numJournalCallbackThreads=8
```

## Broker

The default broker configuration (`conf/broker.conf`) provided in pulsar distribution is good


enough for most of Pulsar deployments. However if you want to tune your broker installation,
here are a few places that you should be looking into.

### numHttpServerThreads

If your machine only has 1 cpu core, setting `numHttpServerThreads` to be larger than 2.
Otherwise it will hit an issue in Jetty server.

### Default Backlog Quota Limit and Retention Policy


By default Pulsar is configured with 10GB backlog quota limit and `producer_request_hold`
retention policy. These settings are relatively reasonable for a multi-tenant deployment.
However if you are deploying a Pulsar cluster for relatively small number of tenants, you should
consider follows:

- Increase the default backlog quota limit when you are planning to store messages for
much longer duration. Otherwise you can quickly hit backlog quota limit.
- Choose the right retention policy. If you can’t tolerant consumers missing data, you
should choose `producer_request_hold`, however it would cause producers timeout
when the topic hits quota limit. If you can tolerant consumers missing data or want to
make sure produce always succeed, you should choose `consumer_backlog_eviction`,
which pulsar would evict the oldest message from the slowest consumer’s backlog.

Plan this policy correctly before you rollout Pulsar to production.

### Delete Inactive Topics

By default Pulsar deletes inactive topics (no subscription found in the topics) after a given
duration `brokerDeleteInactiveTopicsFrequencySeconds`.

```
# Enable the deletion of inactive topics
brokerDeleteInactiveTopicsEnabled=true

# How often to check for inactive topics


brokerDeleteInactiveTopicsFrequencySeconds=60
```

If you want to avoid a topic being deleted due to inactivity, you can do the following:

- Disable deleting inactive topics by setting `brokerDeleteInactiveTopicsEnabled` to `false.


- Or, configure a proper default retention policy for keeping data a configured period or a
configured data size.

### Default Retention Policy

By default, Pulsar “deletes” messages once they are acknowledged by all subscriptions. If there
are no subscriptions available in the topics, messages are treated as “acknowledged”
immediately. If there are no subscription, the topics are also treated as inactive after a given
duration (see section “Delete Inactive Topics” above).
If you want to keep all messages for a while, you can set a retention policy for a namespace. Or
you can configure a default retention policy for all namespaces. This can be done by setting the
following settings:

```
# Default message retention time
defaultRetentionTimeInMinutes=0

# Default retention size


defaultRetentionSizeInMB=0
```

You might also like