To evaluate storage options we’ll setup a Kubernetes cluster in AWS with a rook cluster deployed along with tools for debugging and metrics collection. Then we’ll deploy a pod with 3 different volumes to compare Rook block storage (backed by instance store), EBS gp2, and EBS io1 (SSD) (re: EBS volume types).
Kubernetes kops was used to setup Kubernets cluster in AWS.
For this test Kubernetes nodes are a mid range i3.2xlarge
, with instance storage (1900 GiB NVMe SSD) and Up to 10 Gigabit networking performance. Kubernetes is installed on Ubuntu 16.04 LTS with 3 nodes plus the master.
Upon finishing the kops
create, we should have fully functioning Kubernetes cluster, kops even sets up context for the newlly created cluster in kube config.
$ brew install kops
$ kops create cluster $NAME \
--node-count 3 \
--zones "us-west-2c" \
--node-size "i3.2xlarge" \
--master-size "m3.medium" \
--master-zones "us-west-2c" \
--admin-access x.x.x.x/32 \
--api-loadbalancer-type public \
--cloud aws \
--image "ami-2606e05e" \
--kubernetes-version 1.8.2 \
--ssh-access x.x.x.x/32 \
--ssh-public-key ~/.ssh/me.pub \
--yes
...
$ kubectl get nodes
NAME STATUS AGE VERSION
ip-172-20-42-159.us-west-2.compute.internal Ready 1m v1.8.2
ip-172-20-42-37.us-west-2.compute.internal Ready 2m v1.8.2
ip-172-20-53-26.us-west-2.compute.internal Ready 1m v1.8.2
ip-172-20-55-209.us-west-2.compute.internal Ready 1m v1.8.2
Rook is easy to get running, we’ll run the latest release, 0.6 currently. It’ll manage Ceph cluster configured to our spec. First, rook-operator needs to be deployed:
$ kubectl create -f k8s/rook-operator.yaml
clusterrole "rook-operator" created
serviceaccount "rook-operator" created
clusterrolebinding "rook-operator" created
deployment "rook-operator" created
The Rook cluster is configured to deliver block storage using local disks (instance store) attached directly to hosts running our instances. The disk devices are selected by deviceFilter
, instance store is /dev/nvme0n1
Once the rook cluster is created, you will notice that rook-operator created several pods in the rook namespace to manage ceph components:
The placement for the osd is set in the cluster.yaml such that only two nodes are used for storage. The third node will host the test client pod.
$ kubectl create -f k8s/rook-cluster.yaml
namespace "rook" created
cluster "rook-eval" created
$ kubectl get pods --namespace rook
NAME READY STATUS RESTARTS AGE
rook-api-3588729152-s0dxw 1/1 Running 0 46s
rook-ceph-mgr0-1957545771-bsg7h 1/1 Running 0 46s
rook-ceph-mgr1-1957545771-cth8i 1/1 Running 0 47s
rook-ceph-mon0-t1m3z 1/1 Running 0 1m
rook-ceph-mon1-mkdl4 1/1 Running 0 1m
rook-ceph-mon2-bv1qk 1/1 Running 0 1m
rook-ceph-osd-0027l 1/1 Running 0 46s
rook-ceph-osd-2p90r 1/1 Running 0 46s
Rook storage Pool and StorageClass have to be defined next. Note, that we are creating 2 replicas to provide resiliency on par with EBS:
$ kubectl create -f k8s/rook-storageclass.yaml
pool "replicapool" created
storageclass "rook-block" created
The Rook toolbox was started to provide better visibility into the Rook cluster.
$ kubectl create -f k8s/rook-tools.yaml
pod "rook-tools" created
$ kubectl -n rook exec -it rook-tools -- rookctl status
OVERALL STATUS: OK
USAGE:
TOTAL USED DATA AVAILABLE
5.18 TiB 6.00 GiB 0 B 5.18 TiB
MONITORS:
NAME ADDRESS IN QUORUM STATUS
rook-ceph-mon0 100.66.63.114:6790/0 true OK
rook-ceph-mon1 100.66.113.38:6790/0 true OK
rook-ceph-mon2 100.68.185.191:6790/0 true OK
MGRs:
NAME STATUS
rook-ceph-mgr0 Active
rook-ceph-mgr0 Standby
OSDs:
TOTAL UP IN FULL NEAR FULL
2 2 2 false false
PLACEMENT GROUPS (100 total):
STATE COUNT
active+clean 100
At this point we have Kubernetes with Rook cluster up and running in AWS, we’ll be provisioning storage in the next steps.
Let’s create Persistent Volume Claim (PVC) using Rook block device attached to our testing pod along with different types of EBS devices. Before we proceed, EBS volumes have to be created, note volume IDs outputted by each command to be used in our manifest later:
$ aws ec2 create-volume --availability-zone=us-west-2b --size=120 --volume-type=gp2
...
$ aws ec2 create-volume --availability-zone=us-west-2b --size=120 --volume-type=io1 --iops=6000
...
Let’s create a pod with 3 volumes to run our FIO tests against:
- Rook volume mounted to
/eval
. 120 GiB,ext4
. - EBS io1 (Provisioned IOPS = 6K) volume mounted to
/eval-io1
. 120 GiBext4
. - EBS gp2 (General purpose) volume mounted to
/eval-gp2
. 120 GiB,ext4
.
Note that the blog writeup focused on the performance of the io1
volume for a high performance IOPS scenario.
$ kubectl create -f k8s/test-deployment.yaml
deployment "rookeval" created
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
rook-operator-3796250946-wwh3g 1/1 Running 0 10m
rookeval-1632283128-2c08m 1/1 Running 0 31s
$ kubectl exec -it rookeval-1632283128-2c08m -- df -Th --exclude-type=tmpfs
Filesystem Type Size Used Avail Use% Mounted on
overlay overlay 7.7G 2.8G 5.0G 36% /
/dev/xvdbe ext4 118G 60M 112G 1% /eval-io1
/dev/rbd0 ext4 118G 60M 112G 1% /eval
/dev/xvdbi ext4 118G 60M 112G 1% /eval-gp2
/dev/xvda1 ext4 7.7G 2.8G 5.0G 36% /etc/hosts
All looks good, ready to finally proceed with FIO tests. Our test pod currently has 3 different storage types to compare. It would be interesting to add rook clusters backed by EBSs of different types, and try different instance types as they provide different controllers and drives. Next time perhaps.
-
Rook had higher IOPS in all scenarios except 4K sequential writes. Random writes are the important ones for transactional IO, so I’m focusing on that for now.
-
Need to analyze streaming IO with HDD based devices at some point to compare sequential read/write performance.
-
For the reported results, the testing pod is running on a node different than the storage nodes. For comparison, the test pod was also run on the storage nodes for a hyper-converged scenario. With Ceph being consistent an IO operation is complete only after all replicas are written, so it makes no noticeable difference where the pod lands on you cluster, at least in my testing setup where network capacity is the same across all nodes.