QueryDB

Posts

Spark Exception - Filtering is supported only on partition keys of type string

We got this exception while performing SQL on Hive Table with Partition Column as BIGINT. Ex - select * from mytable where cast(rpt_date AS STRING) >= date_format(date_sub(current_date(),60),'yyyyMMdd') Exception - Caused by: java.lang.reflect.InvocationTargetException: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only on partition keys of type string at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:862) ... 101 more Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only on partition keys of type string at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by

PrestoDB (Trino) SQL Error - java.lang.UnsupportedOperationException: Storage schema reading not supported

We faced following error while querying via Trino, on a Hive Table defined on top of AVRO file format. Error - java.lang.UnsupportedOperationException: Storage schema reading not supported The Solution is to set following property in Hive Metastore - metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader

PrestoDB (Trino) SQL Error - ORC ACID file should have 6 columns, found Y

We faced this error while querying Hive Table using Trino - Error - SQL Error [16777223]: Query failed (#20230505_155701_00194_n2vdp): ORC ACID file should have 6 columns, found 17 This was happening because Table being queried was Hive Managed Internal Table, which by default in CDP ( Cloudera ) distribution is ACID compliant. Now, in order for a Hive Table to be ACID complaint - The underlying file system should be ORC, and there were a few a changes on ORC file structure like the root column should be a struct with 6 nested columns (which encloses the data and the type of operation). Something like below struct< operation: int, originalTransaction: bigInt, bucket: int, rowId: bigInt, currentTransaction: bigInt, row: struct<...> > For more ORC ACID related internals - please take a look here https://orc.apache.org/docs/acid.html Now, problem in our case was that though Hive Table was declared Inte

Spark Exception - java.lang.NullPointerException

java.lang.NullPointerException at org.apache.spark.sql.execution.datasources.orc.OrcColumnVector.getUTF8String(OrcColumnVector.java:167) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage46.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:187) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at

Fixing hbck Inconsistencies

Execute 'hbck_chore_run' in hbase shell to generate a new sub-report. Hole issue: - verify if region is existing in both HDFS and meta. - If not in HDFS it is data loss or cleared by cleaner_chore already. - If not in Meta we can use hbck2 jar reportMissingInMeta option to find out the missing records in meta - Then use addFsRegionsInMeta option to add missing records back to meta - Then restart Active Master and then assigns those regions Orphan Regions: Refer https://community.cloudera.com/t5/Support-Questions/Hbase-Orphan-Regions-on-Filesystem-shows-967-regions-in-set/td-p/307959 - Do "ls" to see "recovered.edits" if there is no HFile means that region was splitting and it failed. - Replay using WALPlayer hbase org.apache.hadoop.hbase.mapreduce.WALPlayer hdfs://bdsnameservice/hbase/data/Namespace/Table/57ed0b774aef9158cfda87c945a0afae/recovered.edits/0000000000001738473 Namespace:Table - Move the Orphan region to some temporary location and clean up