Spark reading Parquet table gives Null records whereas it works from Hive.
if we read a parquet table from Hive it is working -
select * from Table_Parq limit 7;
1
2
3
4
5
6
7
Whereas same doesn't work with Spark -
select * from Table_Parq limit 7;
NULL
NULL
NULL
NULL
NULL
NULL
NULL
It may be because Parquet file has different Schema then Hive Metastore, may be column names are in different case.
Solution -
if we read a parquet table from Hive it is working -
select * from Table_Parq limit 7;
1
2
3
4
5
6
7
Whereas same doesn't work with Spark -
select * from Table_Parq limit 7;
NULL
NULL
NULL
NULL
NULL
NULL
NULL
It may be because Parquet file has different Schema then Hive Metastore, may be column names are in different case.
Solution -
- Read Parquet File on HDFS then Hive Table, or,
- Set following properties -
- set spark.sql.caseSensitive=false;
- set spark.sql.hive.convertMetastoreParquet=false;
Comments
Post a Comment