Skip to main content

Posts

Spark: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.DoubleWritable cannot be cast to org.apache.hadoop.io.Text

 Reason-  This occurs because underlying Parquet or ORC file has column with Double Type , whereas Hive reads it as String. Exception -   Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.DoubleWritable cannot be cast to org.apache.hadoop.io.Text at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(WritableStringObjectInspector.java:41) at org.apache.spark.sql.hive.HiveInspectors$$anonfun$unwrapperFor$23.apply(HiveInspectors.scala:547) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$15.apply(TableReader.scala:426) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$15.apply(TableReader.scala:426) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433) Solution -  One way is to correct Schema, either update Hiv

Error in Installing Zeppelin Interpreter

Error -  Caused by: org.sonatype.aether.transfer.ArtifactNotFoundException: Could not find artifact org.apache.zeppelin:zeppelin-shell:jar:0.8.0 Solution -  export JAVA_TOOL_OPTIONS="-Dzeppelin.interpreter.dep.mvnRepo=http://insecure.repo1.maven.org/maven2/" Then execute or install interpreter -  /usr/hdp/current/zeppelin-server/bin/install-interpreter.sh --name shell

Spark HBase Connector (SHC) - Unsupported Primitive datatype null

While writing Spark DataFrame to HBase Table you may observe following exception - Caused by: java.lang.UnsupportedOperationException: PrimitiveType coder: unsupported data type null         at org.apache.spark.sql.execution.datasources.hbase.types.PrimitiveType.toBytes(PrimitiveType.scala:61)         at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$org$apache$spark$sql$execution$datasources$hbase$HBaseRelation$$convertToPut$1$1.apply(HBaseRelation.scala:213)         at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$org$apache$spark$sql$execution$datasources$hbase$HBaseRelation$$convertToPut$1$1.apply(HBaseRelation.scala:209)         at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) There are suggestions to Upgrade SHC-Core jar file. But, it didn't work for us. Rather it started giving following error -  Caused by: org.apache.spark.sql.execution.datasources.hbase.InvalidRegionNumberE

Spark reading Parquet table gives Null records whereas it works from Hive

Spark reading Parquet table gives Null records whereas it works from Hive. if we read a parquet table from Hive it is working - select * from Table_Parq limit 7; 1 2 3 4 5 6 7 Whereas same doesn't work with Spark - select * from Table_Parq limit 7; NULL NULL NULL NULL NULL NULL NULL It may be because Parquet file has different Schema then Hive Metastore, may be column names are in different case. Solution -  Read Parquet File on HDFS then Hive Table , or,  Set following properties -  set spark.sql.caseSensitive=false; set spark.sql.hive.convertMetastoreParquet=false;