Spark HBase Connector CDP Issue - java.lang.ClassNotFoundException: org.apache.hadoop.hbase.spark.SparkSQLPushDownFilter
We wrote the Spark code to read data from using HBase-Connector as below -
val sql = spark.sqlContext
val df = sql.read.format("org.apache.hadoop.hbase.spark")
.option("hbase.columns.mapping",
"name STRING :key, email STRING c:email, " +
"birthDate DATE p:birthDate, height FLOAT p:height")
.option("hbase.table", "person")
.option("hbase.spark.use.hbasecontext", false)
.load()
df.createOrReplaceTempView("personView")
val results = sql.sql("SELECT * FROM personView")
results.show()
val df = sql.read.format("org.apache.hadoop.hbase.spark")
.option("hbase.columns.mapping",
"name STRING :key, email STRING c:email, " +
"birthDate DATE p:birthDate, height FLOAT p:height")
.option("hbase.table", "person")
.option("hbase.spark.use.hbasecontext", false)
.load()
df.createOrReplaceTempView("personView")
val results = sql.sql("SELECT * FROM personView")
results.show()
Above code works fine. But, if we add a where clause to SQL above, it gives error as below -
val results = sql.sql("SELECT * FROM personView where name='Jaiganesh'")
results.show()
Error -
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException): org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.spark.SparkSQLPushDownFilter
at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toFilter(ProtobufUtil.java:1612)
at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:1157)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:3039)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3369)
at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42278)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.spark.SparkSQLPushDownFilter
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.hadoop.hbase.util.DynamicClassLoader.loadClass(DynamicClassLoader.java:147)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toFilter(ProtobufUtil.java:1603)
... 8 more
at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toFilter(ProtobufUtil.java:1612)
at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:1157)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:3039)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3369)
at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42278)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.spark.SparkSQLPushDownFilter
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.hadoop.hbase.util.DynamicClassLoader.loadClass(DynamicClassLoader.java:147)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toFilter(ProtobufUtil.java:1603)
... 8 more
This seems a problem with HBase-Spark Connector. Refer - https://issues.apache.org/jira/browse/HBASE-22769
The possible resolution that we could find was to set
.option("hbase.spark.pushdown.columnfilter", false)
But, this will disable pushdown of column filters which is needed for efficiency and performance.
Comments
Post a Comment