Spark - java.util.NoSuchElementException: next on empty iterator [SPARK-27514]

Recently, we did upgrade from HDP 3 to CDP 7, which involved upgrading Spark from 2.3 to 2.4.

We did compile and build our Jar with new dependencies. But, code started failing with below error -

23/02/09 16:47:44 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: jav

a.util.NoSuchElementException: next on empty iterator

at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)

at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)

at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63)

at scala.collection.IterableLike$class.head(IterableLike.scala:107)

at scala.collection.mutable.ArrayBuffer.scala$collection$IndexedSeqOptimized$$super$head(ArrayBuffer.scala:48)

at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:126)

at scala.collection.mutable.ArrayBuffer.head(ArrayBuffer.scala:48)

at org.apache.spark.sql.catalyst.optimizer.CollapseWindow$$anonfun$apply$13.applyOrElse(Optimizer.scala:736)

at org.apache.spark.sql.catalyst.optimizer.CollapseWindow$$anonfun$apply$13.applyOrElse(Optimizer.scala:731)

at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:282)

at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:71)

at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:281)

at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformUp(LogicalPlan.scala:29)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformUp(AnalysisHelper.scala:158)

at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformUp(LogicalPlan.scala:29)

We were not able to debug root cause of the error. We tried setting multiple properties as described in https://spark.apache.org/docs/2.4.0/sql-migration-guide-upgrade.html#upgrading-from-spark-sql-23-to-24

But, we did not receive any success. Finally, we realized that Spark was not able to build the plan, as it was dropping certain columns (, may be due to predicate pushdown) . Thus, leading to Empty ArrayBuffer. And, when we call head on Empty ArrayBuffer then it leads to error as above. For ex -

scala> import scala.collection.mutable.ArrayBuffer

import scala.collection.mutable.ArrayBuffer

scala> val buff = new ArrayBuffer[String]()

buff: scala.collection.mutable.ArrayBuffer[String] = ArrayBuffer()

scala> buff.head

java.util.NoSuchElementException: next on empty iterator

at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)

at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)

at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63)

at scala.collection.IterableLike$class.head(IterableLike.scala:107)

at scala.collection.mutable.ArrayBuffer.scala$collection$IndexedSeqOptimized$$super$head(ArrayBuffer.scala:48)

at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:126)

at scala.collection.mutable.ArrayBuffer.head(ArrayBuffer.scala:48)

... 49 elided

Solution Anyways to solve above problem, we did persist on dataframe before calling further action on transformation, which solved this error for us. But, creating a possible performance impact of calling unnecessary persist.

One further debugging, we were able to find code piece and bug relating to same, and we requested Cloudera to provide fix for same -

https://issues.apache.org/jira/browse/SPARK-27514

https://github.com/apache/spark/pull/24411

https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala

QueryDB

Search This Blog

Spark - java.util.NoSuchElementException: next on empty iterator [SPARK-27514]

Comments

Post a Comment

Popular posts

Hive Parse JSON with Array Columns and Explode it in to Multiple rows.

Read from a hive table and write back to it using spark sql

Caused by: java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary

Hadoop Distcp Error Duplicate files in input path

org.apache.spark.sql.AnalysisException: Cannot overwrite a path that is also being read from.;