Zookeeper
Zookeeper
Flavio Junqueira
Benjamin Reed
Yahoo! Research
hCps://cwiki.apache.org/confluence/display/ZOOKEEPER/EurosysTutorial
Eurosys 2011 ‐ Tutorial 1
Part 1
Fundamentals
Yahoo! Portal
Search
E‐mail
Finance
Weather
News
Master
Master
Master
Coordination
Service
Master
Master
Coordination
Service
Master
Coordination
Service
Availability
ParOtion
Consistency
tolerance
• Centrifuge, Microsoft
– Lease service
Adya et al., USENIX NSDI, 2010
• ZooKeeper, Yahoo!
– Coordination kernel
– On Apache since 2008
Hunt et al., USENIX ATC, 2010
Eurosys 2011 ‐ Tutorial 22
Example – Bigtable, HBase
• Sparse column‐oriented data storage
– Tablet: range of rows
– Unit of distribution
• Architecture
– Master
– Tablet servers
• Metadata management
– Politeness constraints
– Shards
• Crash detection
– Live workers
Part 2
The service
ZooKeeper
Introduction
• Coordination kernel
– Does not export concrete primitives
– Recipes to implement primitives
• File system based API
– Manipulate small data nodes:
znodes
ZooKeeper
Client App Follower
Client Lib Session
Leader
atomically
Leader
broadcast
updates
ZooKeeper
Client App Follower
Client Lib Session
Follower
Replicated
ZooKeeper system
Client App Follower
Client Lib
Session
ZooKeeper
Client App Follower
Client Lib
Follower
ZooKeeper
Client App Follower
Client Lib
X = 11
ZooKeeper
Client App Follower
Client Lib
Follower
ZooKeeper
Client App Follower
Client Lib
Client 1
(C1) /
C1 /C‐1
Client 2
(C2)
C3 /C‐2
Client 3 C2 /C‐3
(C3)
10 /foo
Client
/
11 /foo
setData “/foo”, 11
Client
return ok
Client
/
notification
11 /foo
Client
• Load spikes
– Undesirable
Client 1
(C1) /
C1 /C‐1
Client 2
(C2)
C2 /C‐2
Client 3
(Cn) Cn /C‐m
Client 1
(C1) /
Client 2
(C2)
C2 /C‐2
Client 3
(Cn) Cn /C‐m
Client 1
(C1) /
Client 2
(C2) notification C2 /C‐2
Client 3
(Cn) Cn /C‐m
notification
/
Client 1 ZK1
C1 /config
/
ZK2
C1 /config
/
Client 2
ZK3
C1 /config
/
Client 1 ZK1
C1 /config
/
ZK2
C2 /config
setData “/config”, C2
/
Client 2
ZK3
return OK
C2 /config
/
Client 1 ZK1
C1 /config
/
I have changed the config, ZK2
please read it!
C2 /config
/
Client 2
ZK3
C2 /config
getData “/config” /
Client 1 ZK1
return C1 C1 /config
/
ZK2
C2 /config
/
Client 2
ZK3
C2 /config
leader
/foo = C1
– Makes operations
linearizable
Leader
setData
Leader
setData
sync
Leader
sync
Leader
leader getData
/foo = C2
– Makes operations
linearizable
Leader
leader
/foo = C2
– Makes operations
linearizable
Leader
Part 3
How it really works
Master/Worker System
Clients Master Coordination
Monitor the tasks Service
Master
Queue tasks to be executed
Masters Worker Worker Worker Worker
Execute tasks
client1
tasks
create(“/tasks/client1-”,
client1‐1 cmds,
SEQUENTIAL)
client3‐4
cmds is an array of String
client1‐6
assign
worker1 create(“/assign/worker-”,
“”,
worker2 EPHEMERAL SEQUENTIAL)
worker3
listChildren(“/assign”,
true)
Master
master
create(“/master”,
getdata(“/master”,
hostinfo,
true);
EPHEMERAL)
Master Backup
worker2
assign
worker2
setdata(“/assign/worker2”, znode_of_task)
Master
Part 4
Caveat Emptor
Revisit FLP and CAP
What should a master do when
disconnected?
What is the consequence of acAng as a master
while disconnected?
P1 reconnects
P1 elected P1 disconnected P1 expires
gets expired event
Ame
P2 elected
and SessionExpiration
Make sure you test