Skip to main content

Posts

Spark Exception - Filtering is supported only on partition keys of type string

We got this exception while performing SQL on Hive Table with Partition Column as BIGINT. Ex -  select * from mytable where cast(rpt_date AS STRING) >= date_format(date_sub(current_date(),60),'yyyyMMdd')  Exception -  Caused by: java.lang.reflect.InvocationTargetException: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only on partition keys of type string   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)   at java.lang.reflect.Method.invoke(Method.java:498)   at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:862)   ... 101 more Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only on partition keys of type string   at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by

PrestoDB (Trino) SQL Error - java.lang.UnsupportedOperationException: Storage schema reading not supported

  We faced following error while querying via Trino, on a Hive Table defined on top of AVRO file format.  Error -  java.lang.UnsupportedOperationException: Storage schema reading not supported The Solution is to set following property in Hive Metastore -  metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader

PrestoDB (Trino) SQL Error - ORC ACID file should have 6 columns, found Y

  We faced this error while querying Hive Table using Trino -  Error -  SQL Error [16777223]: Query failed (#20230505_155701_00194_n2vdp): ORC ACID file should have 6 columns, found 17 This was happening because Table being queried was Hive Managed Internal Table, which by default in CDP ( Cloudera ) distribution is ACID compliant.  Now, in order for a Hive Table to be ACID complaint -  The underlying file system should be ORC,  and there were a few a changes on ORC file structure like the root column should be a struct with 6 nested columns (which encloses the data and the type of operation). Something like below               struct<     operation: int,     originalTransaction: bigInt,     bucket: int,     rowId: bigInt,     currentTransaction: bigInt,      row: struct<...>      > For more ORC ACID related internals - please take a look here  https://orc.apache.org/docs/acid.html Now, problem in our case was that though Hive Table was declared Inte

Spark Exception - java.lang.NullPointerException

  java.lang.NullPointerException         at org.apache.spark.sql.execution.datasources.orc.OrcColumnVector.getUTF8String(OrcColumnVector.java:167)         at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage46.processNext(Unknown Source)         at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)         at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)         at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:187)         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)         at org.apache.spark.scheduler.Task.run(Task.scala:109)         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)         at

Fixing hbck Inconsistencies

  Execute 'hbck_chore_run' in hbase shell to generate a new sub-report. Hole issue: - verify if region is existing in both HDFS and meta. - If not in HDFS it is data loss or cleared by cleaner_chore already. - If not in Meta we can use hbck2 jar reportMissingInMeta option to find out the missing records in meta - Then use addFsRegionsInMeta option to add missing records back to meta - Then restart Active Master and then assigns those regions Orphan Regions: Refer  https://community.cloudera.com/t5/Support-Questions/Hbase-Orphan-Regions-on-Filesystem-shows-967-regions-in-set/td-p/307959 - Do "ls" to see "recovered.edits" if there is no HFile means that region was splitting and it failed. - Replay using  WALPlayer   hbase org.apache.hadoop.hbase.mapreduce.WALPlayer hdfs://bdsnameservice/hbase/data/Namespace/Table/57ed0b774aef9158cfda87c945a0afae/recovered.edits/0000000000001738473 Namespace:Table - Move the Orphan region to some temporary location and clean up