How to Configure a 3-Node Zookeeper Cluster Without Root Privileges

(I Jump around a bit throughout this post - I will attach a detailed script that highlights all the steps at the bottom)

In this article, we will work through the installation of Apache Zookeeper with the home directory of a non-privileged user of a CentOS 7.

This tutorial is part of a larger set of articles on how to effectively utilize zookeeper as a distributed configuration management application called dman.

We will assume that we have the appropriate user created on the three machines, java is installed, and the appropriate firewall rules are in place for communication between these hosts across the following ports:
  • 2180
  • 2880
  • 3880
People experienced in zookeeper will notice that these are traditionally 2181:2888:3888 (the zookeeper defaults), however, as zookeeper adoption has become somewhat widespread in the distributed systems world, I always recommend a different port scheme on smaller systems. This altered port scheme will ensure that your application does not unknowingly collide with another instance that my be present on the box at the OS level

Since we are creating part of what is to be a larger application, I will start in the users home directory and immediately create the application directory and the underlying data directory to house our personal instance
[dman@thor ~]$ mkdir -p ~/dman/sync
[dman@thor ~]$ cd ./dman/sync/
[dman@thor sync]$ pwd
/home/dman/dman/sync
[dman@thor sync]$

Now my reasoning for creating the sync directory under the ~/dman folder is because I wish to create the application that will utilize the zookeeper instance in the directory above sync, if that is not you intent, you may omit it without consequence.

Once you are in the target directory of your choice, you can use wget to fetch the tarball from a mirror. We will be fetching version 3.4.13 for this example
[dman@thor sync]$ wget http://apache.claz.org/zookeeper/current/zookeeper-3.4.13              .tar.gz
--2019-03-30 15:47:17--  http://apache.claz.org/zookeeper/current/zookeeper-3.4.              13.tar.gz
Resolving apache.claz.org (apache.claz.org)... 216.245.218.171
Connecting to apache.claz.org (apache.claz.org)|216.245.218.171|:80... connected              .
HTTP request sent, awaiting response... 200 OK
Length: 37191810 (35M) [application/x-gzip]
Saving to: ‘zookeeper-3.4.13.tar.gz’

100%[======================================>] 37,191,810  8.01MB/s   in 4.6s

2019-03-30 15:47:22 (7.66 MB/s) - ‘zookeeper-3.4.13.tar.gz’ saved [37191810/3719              1810]

[dman@thor sync]$ ll
total 36324
-rw-rw-r--. 1 dman dman 37191810 Jul 15  2018 zookeeper-3.4.13.tar.gz
[dman@thor sync]$
Once you have fetched the tarball extraction is simply
[dman@thor sync]$ tar xvf zookeeper-3.4.13.tar.gz
... lots of filenames that I won't bore you with ...
[dman@thor sync]$ mkdir data
[dman@thor sync]$ ll
total 36332
drwxrwxr-x.  2 dman dman     4096 Mar 30 16:10 data
drwxr-xr-x. 10 dman dman     4096 Jun 30  2018 zookeeper-3.4.13
-rw-rw-r--.  1 dman dman 37191810 Jul 15  2018 zookeeper-3.4.13.tar.gz
[dman@thor sync]$

So we will be using /home/dman/dman/sync/data as our "local" data directoy. If we are working on our first server we will go into the zookeeper-3.4.13/conf directory and alter the default configuration as follows
[dman@thor sync]$ cd zookeeper-3.4.13
[dman@thor zookeeper-3.4.13]$ cd conf/
[dman@thor conf]$ cp zoo_sample.cfg zoo.cfg

Now were it our intention to run a single node instance, we could simply alter zoo.cfg to utilize /home/dman/dman/sync/data as its data directory (by default it points to a /tmp directory which is definitely not what you want), and run '$ZOOKEEPER_HOME/bin/zkServer.sh start' to start the service in standalone mode. However, we need to build the cluster with a bit more information for it to function correctly.

Each instance in a cluster requires data on its other members to properly sync its configuration. Assuming that we are using three servers, and those servers have static IP addresses as follows:
  • host 1: 10.0.0.101
  • host 2: 10.0.0.102
  • host 3: 10.0.0.103

We need to ensure that each hosts is aware of its own designation in the cluster as well as the designations of its peers. If you are installing on host 1 the cluster the zookeeper instance will expect to find a file in its data directory which indicates its ID in the cluster. The file is called "myid" and it needs to contain a single number only 1 - 3 which designates its ID in the cluster. Be aware that assigning a server in the cluster an ID of 1 does not necessarily imply that server will be elected as the leader of the cluster at runtime. It is for zookeeper internal bookkeeping only.

You can create the file easily as follows
[dman@thor sync]$ echo 1 > ./data/myid
[dman@thor sync]$ ll data
total 8
-rw-rw-r--. 1 dman dman    2 Mar 30 17:21 myid
drwxrwxr-x. 2 dman dman 4096 Mar 30 16:41 version-2
[dman@thor sync]$ cat data/myid
1
[dman@thor sync]$
#!/bin/bash

# The full script, assumes below
export SCRIPT_HOME=/home/dman
export ZOO_HOME=$SCRIPT_HOME/dman/sync
export ZOO_DATA=$ZOO_HOME/data
export ZOO_MYID=$ZOO_DATA/myid
export ZOO_VERS= zookeeper-3.4.13
export ZOO_SCFG=$ZOO_HOME/$ZOO_VERS/conf/zoo_sample.cfg
export ZOO_DCFG=$ZOO_HOME/$ZOO_VERS/conf/zoo.cfg
export ZOO_DLINK=http://apache.claz.org/zookeeper/current/zookeeper-3.4.13.tar.gz