While writing Spark DataFrame to HBase Table you may observe following exception -
Caused by: java.lang.UnsupportedOperationException: PrimitiveType coder: unsupported data type null
at org.apache.spark.sql.execution.datasources.hbase.types.PrimitiveType.toBytes(PrimitiveType.scala:61)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$org$apache$spark$sql$execution$datasources$hbase$HBaseRelation$$convertToPut$1$1.apply(HBaseRelation.scala:213)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$org$apache$spark$sql$execution$datasources$hbase$HBaseRelation$$convertToPut$1$1.apply(HBaseRelation.scala:209)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
Caused by: java.lang.UnsupportedOperationException: PrimitiveType coder: unsupported data type null
at org.apache.spark.sql.execution.datasources.hbase.types.PrimitiveType.toBytes(PrimitiveType.scala:61)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$org$apache$spark$sql$execution$datasources$hbase$HBaseRelation$$convertToPut$1$1.apply(HBaseRelation.scala:213)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$org$apache$spark$sql$execution$datasources$hbase$HBaseRelation$$convertToPut$1$1.apply(HBaseRelation.scala:209)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
There are suggestions to Upgrade SHC-Core jar file. But, it didn't work for us. Rather it started giving following error -
Caused by: org.apache.spark.sql.execution.datasources.hbase.InvalidRegionNumberException: Number of regions specified for new table must be greater than 3.
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.createTable(HBaseRelation.scala:168)
at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:60)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:471)
Solution -
A workaround for this would be to replace the null values in spark df column into a default string and then write the dataframe to HBase
Comments
Post a Comment