org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow

We were running an application which was leading to below error -

Job aborted due to stage failure: Task 137 in stage 5.0 failed 4 times, most recent failure: Lost task 137.3 in stage 5.0 (TID 2090, ncABC.hadoop.com, executor 1): org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 59606960. To avoid this, increase spark.kryoserializer.buffer.max value.
	at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:330)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:456)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 59606960
	at com.esotericsoftware.kryo.io.Output.require(Output.java:167)
	at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:251)
	at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:237)
	at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:49)
	at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:38)
	at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)
	at com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:37)
	at com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:33)
	at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)

Solution -

On further analysis, we found that this error was originating from BroadcastNestedLoopJoin

at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:165)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:162)
at org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:150)
at org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.doExecute(BroadcastNestedLoopJoinExec.scala:343)

So, we removed BroadcastNestedLoopJoin by updating SQL where clause having NOT IN to NOT EXISTS. Refer details here - http://querydb.blogspot.com/2021/06/spark-disable-broadcast-join-not.html

This solved our problem.

Alternative Solution -

We read this solution at multiple places but we didn't try to set below properties -

--conf  spark.kryoserializer.buffer.max=1024m  spark.kryoserializer.buffer=512m

And, don't recommend to set it anything other than default values because,

There is a note that there will be one buffer per core on each worker. This buffer will grow up to spark.kryoserializer.buffer.max if needed.

That said, if you have worker with 4 cores then 4*512m ~ 2GB is taken up for Kryo Buffer, and that seems a good chunk of memory.

QueryDB

Search This Blog

org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow

Comments

Post a Comment

Popular posts

Spark MongoDB Connector Not leading to correct count or data while reading

Scala Spark building Jar leads java.lang.StackOverflowError

MongoDB Chunk size many times bigger than configure chunksize (128 MB)

AWS EMR Spark – Much Larger Executors are Created than Requested

Hive Count Query not working