HBase Performance Optimization- Page3

Refer previous article here @ https://querydb.blogspot.com/2023/10/hbase-performance-optimization-page2.html

It was determined that for Spark Jobs with org.apache.hadoop.hive.hbase.HBaseStorageHandler, following were set by default-

"cacheBlocks":true

"caching":-1

As we have frequent Scan's most of HBase Memory Cache was occupied by Analytical Tables. Also, having caching as "-1" means for every row there will be a RPC call. For example if ABC table has 30 million records that will lead to same amount of calls for each scan.

Finally, we were able to figure out solution for same. We require to set following properties for Hive on Hbase table -

alter table T1 set TBLPROPERTIES('hbase.scan.cacheblock'='false');

alter table T1 set TBLPROPERTIES('hbase.scan.cache'='1000');

By setting above properties Scan data won't be cached, and it will reduce number of RPC calls to HBase. For example, ABC Table with 30 million records will have just 30,000 RPC calls for complete data.

Please refer Spark Application log trace as below -

Before setting these properties-

{"startRow":"","stopRow":"","batch":-1,"cacheBlocks":true,"totalColumns":1,"maxResultSize":"-1","families":{"cf1":["name"]},"caching":-1,"maxVersions":1,"timeRange":["0","9223372036854775807"]}

After setting above properties -

{"startRow":"","stopRow":"","batch":-1,"cacheBlocks":false,"totalColumns":1,"maxResultSize":"-1","families":{"cf1":["name"]},"caching":1000,"maxVersions":1,"timeRange":["0","9223372036854775807"]}

QueryDB

Search This Blog

HBase Performance Optimization- Page3

Comments

Post a Comment

Popular posts

Spark MongoDB Connector Not leading to correct count or data while reading

Scala Spark building Jar leads java.lang.StackOverflowError

MongoDB Chunk size many times bigger than configure chunksize (128 MB)

AWS EMR Spark – Much Larger Executors are Created than Requested

Hive Count Query not working