Load Balance or dynamic discovery of HiveServer2 Connection from Beeline or Hive Shell

To provide high availability or load balancing for HiveServer2, Hive provides a function called dynamic service discovery where multiple HiveServer2 instances can register themselves with Zookeeper. Instead of connecting to a specific HiveServer2 directly, clients connect to Zookeeper which returns a randomly selected registered HiveServer2 instance.

For example -

Below command connects to Hive Server on MachineA

beeline -u "jdbc:hive2://machineA:10000"

Below command connects to Zookeeper Node: to determine one of the available Hive Server's to make a connection

beeline -u "jdbc:hive2://machineA:2181,machineB:2181,machineC:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-mob-batch?tez.queue.name=myyarnqueue"

We can Create ZNode with Zookeeper as follows -

Open Zookeeper command line interface

zookeeper-client

Connect to Zookeeper Server

connect machineA:2181,machineB:2181,machineC:2181

Create ZNode

create /hiveserver2-mob-batch

Manually, Register HS2 with Zookeeper under a namespace

create /hiveserver2-mob-batch/serverUri=machineA:10000;version=3.1.3000.7.1.2.0-96;sequence=0000000082
create /hiveserver2-mob-batch/serverUri=machineB:10000;version=3.1.3000.7.1.2.0-96;sequence=0000000081
create /hiveserver2-mob-batch/serverUri=machineC:10000;version=3.1.3000.7.1.2.0-96;sequence=0000000051

Verify the Namespace by executing below

ls /hiveserver2-mob-batch
[serverUri=machineC:10000;version=3.1.3000.7.1.2.0-96;sequence=0000000051, serverUri=machineA:10000;version=3.1.3000.7.1.2.0-96;sequence=0000000082, serverUri=machineB:10000;version=3.1.3000.7.1.2.0-96;sequence=0000000081]

To deregister a particular HiveServer2, in the Zookeeper command line interface, run the following command

delete /hiveserver2-mob-batch/serverUri=machineC:10000;version=3.1.3000.7.1.2.0-96;sequence=0000000051

After you deregister the HiverServer2 from Zookeeper, it will not return the deregistered HiveServer2 for new client connections. However, any active client session is not affected by deregistering the HiveServer2 from Zookeeper.

To deregister all HiveServer2 instances of a particular version, run the following command from the command line:

hive --service hiveserver2 --deregister <version_number>

Now, Even after, we do above manual configuration in Zookeeper. We might still get an error like below, when invoking beeline/ hive -

22/04/11 19:43:31 [main-EventThread]: ERROR imps.EnsembleTracker: Invalid config event received: {server.1=machineA:3181:4181:participant, version=0, server.3=machineB:3181:4181:participant, server.2=machineC:3181:4181:participant}

Error: org.apache.hive.jdbc.ZooKeeperHiveClientException: Unable to read HiveServer2 configs from ZooKeeper (state=,code=0)

This is because following steps needs to ensured by Admin Team for zookeeper discovery for HS2

Configuration Requirements

1. Set hive.zookeeper.quorum to the ZooKeeper ensemble (a comma separated list of ZooKeeper server host:ports running at the cluster)

2. Customize hive.zookeeper.session.timeout so that it closes the connection between the HiveServer2’s client and ZooKeeper if a heartbeat is not received within the timeout period.

3. Set hive.server2.support.dynamic.service.discovery to true

4. Set hive.server2.zookeeper.namespace to the value that you want to use as the root namespace on ZooKeeper. The default value is hiveserver2.

5. The adminstrator should ensure that the ZooKeeper service is running on the cluster, and that each HiveServer2 instance gets a unique host:port combination to bind to upon startup.

Refer - https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.3.0/bk_hadoop-ha/content/ha-hs2-rolling-upgrade.html

As a developer, we applied following hack to get random HiveServer2 from Zookeper. Thus, distributing load across HS2-

beeline -u "jdbc:hive2://$( ( echo "connect machineA:2181,machineB:2181,machineC:2181"; echo "ls /hiveserver2-mob-batch") | zookeeper-client | grep -oP '(?<=serverUri=).*?(?=;)'| shuf | head -1)/default;principal=hive/_HOST@MYDOMAIN"

What above command is doing -

Open zookeeper-client

connect machineA:2181,machineB:2181,machineC:2181
ls /hiveserver2-mob-batch

Parse HS2 URL's, as mentioned between - serverUri= and ;
Does Random shuffling of all URL's - shuf
Pick up first random URL - head -1
Concatenate string to form JDBC URL - jdbc:hive2:// ...

QueryDB

Search This Blog

Load Balance or dynamic discovery of HiveServer2 Connection from Beeline or Hive Shell

Comments

Post a Comment

Popular posts

Spark MongoDB Connector Not leading to correct count or data while reading

Scala Spark building Jar leads java.lang.StackOverflowError

MongoDB Chunk size many times bigger than configure chunksize (128 MB)

Hive Parse JSON with Array Columns and Explode it in to Multiple rows.

AWS EMR Spark – Much Larger Executors are Created than Requested