Spark: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.DoubleWritable cannot be cast to org.apache.hadoop.io.Text
Reason-
This occurs because underlying Parquet or ORC file has column with Double Type , whereas Hive reads it as String.
Exception -
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.DoubleWritable cannot be cast to org.apache.hadoop.io.Text at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(WritableStringObjectInspector.java:41) at org.apache.spark.sql.hive.HiveInspectors$$anonfun$unwrapperFor$23.apply(HiveInspectors.scala:547) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$15.apply(TableReader.scala:426) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$15.apply(TableReader.scala:426) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433)
Solution -
One way is to correct Schema, either update Hive Table schema or re-write ORC/ Parquet file with correct data type.
Other way to bypass this error is to set following properties -
- ORC - set spark.sql.hive.convertMetastoreOrc=true; (Note this property works with 1.6 , may not work with Spark 2)
- PARQUET - set spark.sql.hive.convertMetastoreParquet=true;
Comments
Post a Comment