QueryDB

Posts

Showing posts from July, 2020

Spark HBase Connector (SHC) - Unsupported Primitive datatype null

While writing Spark DataFrame to HBase Table you may observe following exception - Caused by: java.lang.UnsupportedOperationException: PrimitiveType coder: unsupported data type null at org.apache.spark.sql.execution.datasources.hbase.types.PrimitiveType.toBytes(PrimitiveType.scala:61) at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$org$apache$spark$sql$execution$datasources$hbase$HBaseRelation$$convertToPut$1$1.apply(HBaseRelation.scala:213) at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$org$apache$spark$sql$execution$datasources$hbase$HBaseRelation$$convertToPut$1$1.apply(HBaseRelation.scala:209) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) There are suggestions to Upgrade SHC-Core jar file. But, it didn't work for us. Rather it started giving following error - Caused by: org.apache.spark.sql.execution.datasources.hbase.InvalidRegionNumberE

Spark reading Parquet table gives Null records whereas it works from Hive

Spark reading Parquet table gives Null records whereas it works from Hive. if we read a parquet table from Hive it is working - select * from Table_Parq limit 7; 1 2 3 4 5 6 7 Whereas same doesn't work with Spark - select * from Table_Parq limit 7; NULL NULL NULL NULL NULL NULL NULL It may be because Parquet file has different Schema then Hive Metastore, may be column names are in different case. Solution - Read Parquet File on HDFS then Hive Table , or, Set following properties - set spark.sql.caseSensitive=false; set spark.sql.hive.convertMetastoreParquet=false;