Skip to main content

Posts

Showing posts from March, 2023

Fixing hbck Inconsistencies

  Execute 'hbck_chore_run' in hbase shell to generate a new sub-report. Hole issue: - verify if region is existing in both HDFS and meta. - If not in HDFS it is data loss or cleared by cleaner_chore already. - If not in Meta we can use hbck2 jar reportMissingInMeta option to find out the missing records in meta - Then use addFsRegionsInMeta option to add missing records back to meta - Then restart Active Master and then assigns those regions Orphan Regions: Refer  https://community.cloudera.com/t5/Support-Questions/Hbase-Orphan-Regions-on-Filesystem-shows-967-regions-in-set/td-p/307959 - Do "ls" to see "recovered.edits" if there is no HFile means that region was splitting and it failed. - Replay using  WALPlayer   hbase org.apache.hadoop.hbase.mapreduce.WALPlayer hdfs://bdsnameservice/hbase/data/Namespace/Table/57ed0b774aef9158cfda87c945a0afae/recovered.edits/0000000000001738473 Namespace:Table - Move the Orphan region to some temporary location and clean up

HBase Utility - Merging Regions in HBase

  Growing HBase Cluster, and difficulty to get physical hardware is something that every enterprise deals with... And sometimes, the amount of ingest starts putting pressure on components before new hosts can be added. As, I write this post our cluster was running with 46 nodes and each node having 600 regions per server. This is bad for performance. You can use the following formula to estimate the number of regions for a RegionServer: (regionserver_memory_size) * (memstore_fraction) / ((memstore_size) * (num_column_families)) Reference  https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.0/hbase-data-access/content/memstore-size-regionservers.html We noticed that HBase doesn't have automatic process to reduce and merge regions. As over the time, many small size or empty regions are formed on cluster which degrades the performance.  While researching to cope up with this problem: we came across following scripts -  https://appsintheopen.com/posts/51-merge-empty-hbase-regions https

Refresh Multiple lines or Single line in Linux Shell

  Below code gives you an example to refresh multiple lines using tput while :; do     echo "$RANDOM"     echo "$RANDOM"     echo "$RANDOM"     sleep 0.2     tput cuu1 # move cursor up by one line     tput el # clear the line     tput cuu1     tput el     tput cuu1     tput el done Below code gives you an example to refresh or reprint same line on STDOUT while true; do echo -ne "`date`\r"; done

Which one should I use - PrestoDB or Trino ?

  First thing to understand is why to use Presto or Trino.  We had been running two clusters specifically Hortonworks (HDP) variant & Cloudera (CDP) variant.  Hive Tables built on HDP were mostly ORC whereas Tables that existed for us on CDP were mostly Parquet. We wanted to add ad-hoc querying functionality to our cluster. And, we came across Apache Impala as an excellent tool for this purposes.  Only CDP supported Apache Impala. Impala had limitation to work with Parquet, Kudu, HBase. Before CDP 6.* there was no support for ORC file format with Impala. Thus, we came to know about PrestoDB, which was built at Facebook, and was an excellent distributed SQL Engine for  ad-hoc querying.  It not only supported ORC but has connectors for multiple data sources. A bit history of Presto -  Developed at Facebook ( 2012) Supported by Presto Foundation establish by Linux Foundation (2019) Original Developers & Linux Foundation get into conflict on naming & branding. Did a hard Fork o

Open SSL AES-256 Encryption / Decryption Command Line and Java Code using bouncycastle

  OpenSSL encrypted files begin with an 8-byte signature: the ASCII characters " Salted__ ". Files have an 8-byte signature, followed by an 8(?)-byte salt. Following the salt is the encrypted data. The salt and password are to be combined in a particular way, to derive the encryption key and initialization vector. No information about which encryption cipher was used is stored in the file. In order to decrypt the file, the cipher must be known by external means, or guessed. (Obviously, the same goes for the password.) Above is old/ deprecated mechanism of OpenSSL to derive encryption key.  So, many a time -one can see following warning while running OpenSSL commands -  *** WARNING : deprecated key derivation used. Using -iter or -pbkdf2 would be better. Also, note that default message digest for OpenSSL has been changed from md5 to sha-256. So, one may face problem to decrypt encrypted file generated from, not same version of OpenSSL Also, Refer - https://stackoverflow.com/qu