In case your Spark application is failing with below error -
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:36)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$5.apply(TableReader.scala:399)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$5.apply(TableReader.scala:399)
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:36)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$5.apply(TableReader.scala:399)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$5.apply(TableReader.scala:399)
Analysis & Cause - This is a data reading error. It may be because ORC Data files have some column as Text (, or String). But, Hive table defined on top of that has column defined as Int.
Solution- Either update the datatype in ORC file or Hive Metadata. So, that both are in Sync.
Also, to verify above behavior execute the query from -
- Hive Shell - It should work fine.
- Spark Shell - It should fail.
Comments
Post a Comment