Recently, we did upgrade from HDP 3 to CDP 7, which involved upgrading Spark from 2.3 to 2.4.
- We did compile and build our Jar with new dependencies. But, code started failing with below error -
a.util.NoSuchElementException: next on empty iterator
at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63)
at scala.collection.IterableLike$class.head(IterableLike.scala:107)
at scala.collection.mutable.ArrayBuffer.scala$collection$IndexedSeqOptimized$$super$head(ArrayBuffer.scala:48)
at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:126)
at scala.collection.mutable.ArrayBuffer.head(ArrayBuffer.scala:48)
at org.apache.spark.sql.catalyst.optimizer.CollapseWindow$$anonfun$apply$13.applyOrElse(Optimizer.scala:736)
at org.apache.spark.sql.catalyst.optimizer.CollapseWindow$$anonfun$apply$13.applyOrElse(Optimizer.scala:731)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:282)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:282)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:71)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:281)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformUp(LogicalPlan.scala:29)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformUp(AnalysisHelper.scala:158)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformUp(LogicalPlan.scala:29)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformUp(LogicalPlan.scala:29)
We were not able to debug root cause of the error. We tried setting multiple properties as described in https://spark.apache.org/docs/2.4.0/sql-migration-guide-upgrade.html#upgrading-from-spark-sql-23-to-24
But, we did not receive any success. Finally, we realized that Spark was not able to build the plan, as it was dropping certain columns (, may be due to predicate pushdown) . Thus, leading to Empty ArrayBuffer. And, when we call head on Empty ArrayBuffer then it leads to error as above. For ex -
scala> import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable.ArrayBuffer
scala> val buff = new ArrayBuffer[String]()
buff: scala.collection.mutable.ArrayBuffer[String] = ArrayBuffer()
scala> buff.head
java.util.NoSuchElementException: next on empty iterator
at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63)
at scala.collection.IterableLike$class.head(IterableLike.scala:107)
at scala.collection.mutable.ArrayBuffer.scala$collection$IndexedSeqOptimized$$super$head(ArrayBuffer.scala:48)
at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:126)
at scala.collection.mutable.ArrayBuffer.head(ArrayBuffer.scala:48)
... 49 elided
Solution Anyways to solve above problem, we did persist on dataframe before calling further action on transformation, which solved this error for us. But, creating a possible performance impact of calling unnecessary persist.
One further debugging, we were able to find code piece and bug relating to same, and we requested Cloudera to provide fix for same -
Comments
Post a Comment