QueryDB

Posts

Spark Error - missing part 0 of the schema, 2 parts are expected

Exception - Caused by: org.apache.spark.sql.AnalysisException : Could not read schema from the hive metastore because it is corrupted. (missing part 0 of the schema, 2 parts are expected).; Analysis - · Check for table definition. In TBLProperties, you might find something like this – > spark.sql.sources.schema.numPartCols > 'spark.sql.sources.schema.numParts' 'spark.sql.sources.schema.part.0' > 'spark.sql.sources.schema.part.1' 'spark.sql.sources.schema.part.2' > 'spark.sql.sources.schema.partCol.0' > 'spark.sql.sources.schema.partCol.1' That’s what error seems to say that part1 is defined but part0 is missing. Solution - Drop & re-create table. If Table was partitioned then all partitions would have been removed. So do either of below - · Msck repair table <db_name>.<table_name> · alter table <db_name>.<table_name> add partition (<partion_name&g

java.lang.NoClassDefFoundError: org/apache/hbase/thirdparty/com/google/common/cache/CacheLoader

User class threw exception: java.lang.NoClassDefFoundError: org/apache/hbase/thirdparty/com/google/common/cache/CacheLoader at org.apache.spark.sql.execution.datasources.hbase.HBaseConnectionKey.liftedTree1$1(HBaseConnectionCache.scala:188) at org.apache.spark.sql.execution.datasources.hbase.HBaseConnectionKey.<init>(HBaseConnectionCache.scala:187) at org.apache.spark.sql.execution.datasources.hbase.HBaseConnectionCache$.getConnection(HBaseConnectionCache.scala:144) at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.createTableIfNotExist(HBaseRelation.scala:126) at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:63) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) Reason - ClassNotFound or NoClassDefFoundError Exception occurs when a third party

Spark - Caused by: java.io.IOException: Not a file:

When you want to read data which is present in sub directories then Spark prompts below error - Caused by: java.io.IOException: Not a file: hdfs://sdatalakedev/a/b/c=dinesh/subDir1 Solution is to set below property - set mapreduce.input.fileinputformat.input.dir.recursive=true

Spark - java.lang.AssertionError: assertion failed

Spark SQL fails to read data from a ORC hive table that has a new column added to it. Giving Exception - java.lang.AssertionError: assertion failed at scala.Predef$. assert (Predef.scala:165) at org.apache.spark.sql.execution.datasources.LogicalRelation$$anonfun$1.apply(LogicalRelation.scala:39) at org.apache.spark.sql.execution.datasources.LogicalRelation$$anonfun$1.apply(LogicalRelation.scala:38) at scala.Option.map(Option.scala:145) This happens when following property is set - spark.sql.hive.convertMetastoreOrc= true Solution - Comment out property if being set explicitly or set it to false. Refer https://issues.apache.org/jira/browse/SPARK-18355

org.apache.spark.sql.AnalysisException: Cannot overwrite a path that is also being read from.;

Caused by: org.apache.spark.sql.AnalysisException: Cannot overwrite a path that is also being read from.; at org.apache.spark.sql.execution.command.DDLUtils$.verifyNotReadPath(ddl.scala:906) at org.apache.spark.sql.execution.datasources.DataSourceAnalysis$$anonfun$apply$1.applyOrElse(DataSourceStrategy.scala:192) at org.apache.spark.sql.execution.datasources.DataSourceAnalysis$$anonfun$apply$1.applyOrElse(DataSourceStrategy.scala:134) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256) at org.apache.spark.sql.execution.datasources.DataSourceAnalysis.apply(DataSourceStrategy.scala:134) at org.apache.spark.sql.execution.datasource