Skip to main content

Posts

Spark - java.lang.AssertionError: assertion failed

  Spark SQL fails to read data from a ORC hive table that has a new column added to it. Giving Exception -  java.lang.AssertionError: assertion failed at scala.Predef$. assert (Predef.scala:165) at org.apache.spark.sql.execution.datasources.LogicalRelation$$anonfun$1.apply(LogicalRelation.scala:39) at org.apache.spark.sql.execution.datasources.LogicalRelation$$anonfun$1.apply(LogicalRelation.scala:38) at scala.Option.map(Option.scala:145) This happens when following property is set -  spark.sql.hive.convertMetastoreOrc= true Solution - Comment out property if being set explicitly or set it to false. Refer  https://issues.apache.org/jira/browse/SPARK-18355

org.apache.spark.sql.AnalysisException: Cannot overwrite a path that is also being read from.;

  Caused by: org.apache.spark.sql.AnalysisException: Cannot overwrite a path that is also being read from.; at org.apache.spark.sql.execution.command.DDLUtils$.verifyNotReadPath(ddl.scala:906) at org.apache.spark.sql.execution.datasources.DataSourceAnalysis$$anonfun$apply$1.applyOrElse(DataSourceStrategy.scala:192) at org.apache.spark.sql.execution.datasources.DataSourceAnalysis$$anonfun$apply$1.applyOrElse(DataSourceStrategy.scala:134) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256) at org.apache.spark.sql.execution.datasources.DataSourceAnalysis.apply(DataSourceStrategy.scala:134) at org.apache.spark.sql.execution.datasource

Kylin- Building Cube using MR Job throws java.lang.ArrayIndexOutOfBoundsException

Caused by: java.lang.ArrayIndexOutOfBoundsException at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1453) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1349) at java.io.DataOutputStream.writeInt(DataOutputStream.java:197 ) at org.apache.hadoop.io.BytesWritable.write(BytesWritable.java:188) Solution- Set kylin.engine.mr.config-override. mapreduce.task.io.sort.mb  to 1024 

Spark 2 Application Errors & Solutions

Exception -  Exception in thread "broadcast-exchange-0" java.lang.OutOfMemoryError: Not enough memory to build and broadcast This is Driver Exception and can be solved by  setting spark.sql.autoBroadcastJoinThreshold to -1 Or, increasing --driver-memory Exception -  Container  is running beyond physical memory limits. Current usage: X GB of Y GB physical memory used; X GB of Y GB virtual memory used. Killing container YARN killed container as it was exceeding memory limits. Increase  --driver-memory --executor-memory   Exception - ERROR Executor: Exception in task 600 in stage X.X (TID 12345) java.lang.OutOfMemoryError: GC overhead limit exceeded This means that Executor JVM was spending more time in Garbage collection than actual execution.  This JVM feature can be disabled by adding -XX:-UseGCOverheadLimit Increasing Executor memory may help --executor-memory Make data more distributed so that it is not skewed to one executor. Might use parallel GC -XX:+UseParallelGC or -XX