With Spark 2.2, while doing insert into table in SaveMode.Overwrite , data is getting duplicated.
And, Spark JOB doesn't fails in this case leading to duplicate data in Hive Table.
We analyzed this behavior and found that existing data is not getting deleted from Hive Partition due to permission issue.
Caused by:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
Permission denied: user=bd-prod, access=EXECUTE,
inode="/user/....":hive:hdfs:drwx------
And, Spark JOB doesn't fails in this case leading to duplicate data in Hive Table.
Comments
Post a Comment